fmri-bmd-s3d

Model Summary

Modality	fMRI
Training Dataset	BOLD Moments Dataset (BMD) (MNI152 volume space)
Species	Human
Stimuli	3 second videos
Model Type	3D CNN model (s3d)
Creator	Alessandro Gifford

Description

These encoding models consist in a linear mapping (through linear regression) of video CNN (Xie et al., 2017) image features onto fMRI responses. Prior to mapping onto fMRI responses, the video features have been downsampled to 100 principal components using principal component analysis.

The encoding models were trained on the BOLD Moments Dataset (BMD) (Lahner et al., 2024), fMRI responses of 10 subjects to 1102 3-second naturalistic movies coming from the Memento10k dataset dataset (Newman et al., 2020). One encoding model was trained for each BMD subject, and for each fMRI vertex.

Preprocessing. The encoding models are trained on BMD’s data prepared in MNI152 volume space, from the “versionB” preprocessing version. Note that the BMD data were z-scored at each scan session, and as a consequence the in silico fMRI responses generated by the encoding models also live in z-scored space.

Model training partition. fMRI responses for 1000 videos.

Model testing partition. fMRI responses for 102 videos.

ROIs. Each ROI in the metadata consists of a tuple with 3 items: (1) The ROI voxel indices in 3-dimensional brain volume space; (2) The ROI voxel indices in 1-dimensional flattened brain volume space; (3) the NIfTI images of the ROI voxel indices.

Metadata

fmri

group_mask : (62, 77, 61) - Whole brain group mask (voxels defined for all subjects)

group_mask_header : object - Whole brain group mask header

group_mask_affine : (4, 4) - Whole brain group mask affine

sub_mask : (62, 77, 61) - Whole brain subject mask

sub_mask_header : object - Whole brain subject mask header

sub_mask_affine : (4, 4) - Whole brain subject mask affine

roisdict - ROI voxel indices (l = left hemisphere, r = right hemisphere)
lV1v : tuple - Visual area 1 ventral (LH)

rV1v : tuple - Visual area 1 ventral (RH)

lV1d : tuple - Visual area 1 dorsal (LH)

rV1d : tuple - Visual area 1 dorsal (RH)

lV2v : tuple - Visual area 2 ventral (LH)

rV2v : tuple - Visual area 2 ventral (RH)

lV2d : tuple - Visual area 2 dorsal (LH)

rV2d : tuple - Visual area 2 dorsal (RH)

lV3v : tuple - Visual area 3 ventral (LH)

rV3v : tuple - Visual area 3 ventral (RH)

lV3d : tuple - Visual area 3 dorsal (LH)

rV3d : tuple - Visual area 3 dorsal (RH)

lV3ab : tuple - Visual areas 3a and 3b (LH)

rV3ab : tuple - Visual areas 3a and 3b (RH)

lhV4 : tuple - Human V4 complex (LH)

rhV4 : tuple - Human V4 complex (RH)

lFFA : tuple - Fusiform face area (LH)

rFFA : tuple - Fusiform face area (RH)

lOFA : tuple - Occipital face area (LH)

rOFA : tuple - Occipital face area (RH)

lEBA : tuple - Extrastriate body area (LH)

rEBA : tuple - Extrastriate body area (RH)

lLOC : tuple - Lateral occipital complex (LH)

rLOC : tuple - Lateral occipital complex (RH)

lPPA : tuple - Parahippocampal place area (LH)

rPPA : tuple - Parahippocampal place area (RH)

lRSC : tuple - Retrosplenial cortex (LH)

rRSC : tuple - Retrosplenial cortex (RH)

lSTS : tuple - Superior temporal sulcus (LH)

rSTS : tuple - Superior temporal sulcus (RH)

lTOS : tuple - Temporal occipital sulcus (LH)

rTOS : tuple - Temporal occipital sulcus (RH)

lMT : tuple - Middle temporal area (LH)

rMT : tuple - Middle temporal area (RH)

l7AL : tuple - Dorsal intraparietal area 7AL (LH)

r7AL : tuple - Dorsal intraparietal area 7AL (RH)

lIPS0 : tuple - Intraparietal area IPS0 (LH)

rIPS0 : tuple - Intraparietal area IPS0 (RH)

lIPS1-2-3 : tuple - Intraparietal areas IPS1, IPS2, and IPS3 (LH)

rIPS1-2-3 : tuple - Intraparietal areas IPS1, IPS2, and IPS3 (RH)

lPFt : tuple - Inferior parietal area PFt (LH)

rPFt : tuple - Inferior parietal area PFt (RH)

lBA2 : tuple - Brodmann area 2 (LH)

rBA2 : tuple - Brodmann area 2 (RH)

lPFop : tuple - Inferior parietal area PFop (LH)

rPFop : tuple - Inferior parietal area PFop (RH)

BMDgeneral : tuple - BMD general visual cortex mask

encoding_models

noiseceiling_task_train_n_1 : (N,) - Voxelwise noise ceiling computed on single train data repeats

noiseceiling_task_train_n_3 : (N,) - Voxelwise noise ceiling computed on all train data repeats

noiseceiling_task_test_n_1 : (N,) - Voxelwise noise ceiling computed on single test data repeats

noiseceiling_task_test_n_10 : (N,) - Voxelwise noise ceiling computed on all test data repeats

correlation : (N,) - Correlation prediction accuracy

r2 : (N,) - Explained variance (R² prediction accuracy)

explained_variance : (N,) - Noise-ceiling-normalized explained variance

Input

Type	`numpy.ndarray`
Shape	`['batch_size', 'video_frames', '3_channels', 'height', 'width']`
Description	The input should be a batch of RGB video frames. While the model takes an input videos of any duration, we recommend using ~3-second videos to match the duration of the videos used to train the encoding models. The videos should have at least 14 frames.
Constraints	Image values should be integers in range [0, 255]. Image dimensions (height, width) should be equal (square). Minimum recommended video size: 256×256 pixels.

Output

Type	`numpy.ndarray`
Shape	`['batch_size', 'n_voxels']`
Description	The output is a 2D array containing in silico fMRI responses.

Dimensions:

Name	Description
batch_size	Number of stimulus videos in the batch.
n_voxels	Number of selected voxels for which the in silico fMRI responses are generated.

Parameters

Parameters used in `get_encoding_model`

This function loads the encoding model.

model_id	Type: str Required: Yes Description: Unique identifier of the model to load. Valid Values: fmri-bmd-s3d Example: “fmri-bmd-s3d”
subject	Type: int Required: Yes Description: Subject ID from the BMD dataset (1-10). Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Example: 1
selection	Type: dict Required: No Description: Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for whole brain voxels. Properties: roi Type: str Description: The region-of-interest (ROI) for which the in silico fMRI responses are generated. Valid values: “lV1v”, “rV1v”, “lV1d”, “rV1d”, “lV2v”, “rV2v”, “lV2d”, “rV2d”, “lV3v”, “rV3v”, “lV3d”, “rV3d”, “lV3ab”, “rV3ab”, “lhV4”, “rhV4”, “lFFA”, “rFFA”, “lOFA”, “rOFA”, “lEBA”, “rEBA”, “lLOC”, “rLOC”, “lPPA”, “rPPA”, “lRSC”, “rRSC”, “rSTS”, “lSTS”, “lTOS”, “rTOS”, “lMT”, “rMT”, “l7AL”, “r7AL”, “lIPS0”, “rIPS0”, “rIPS1-2-3”, “lIPS1-2-3”, “lPFt”, “rPFt”, “lBA2”, “rBA2”, “lPFop”, “rPFop”, “BMDgeneral” voxels Type: numpy.ndarray Description: Binary one-hot encoded vector with ones indicating the voxels for which the in silico fMRI responses are generated. This vector must have exactly the same length as the number of voxels, which varies for each subject: - Subject 1: 108,219 voxels - Subject 2: 108,603 voxels - Subject 3: 108,366 voxels - Subject 4: 108,283 voxels - Subject 5: 108,201 voxels - Subject 6: 108,449 voxels - Subject 7: 108,126 voxels - Subject 8: 108,407 voxels - Subject 9: 108,250 voxels - Subject 10: 107,987 voxels The voxels from the one-hot encoded vector are only selected if the “roi” key is not provided, or has value None.
device	Type: str Required: No Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU. Valid Values: “cpu”, “cuda”, “auto” Example: “auto”

Parameters used in `encode`

This function generates in silico neural responses using the encoding model previously loaded.

model	Type: BaseModelInterface Required: Yes Description: An instantiated and loaded encoding model.
stimulus	Type: numpy.ndarray Required: Yes Description: A batch of RGB videos to be encoded. Videos should be in integer format with values in the range [0, 255], of square dimensions (e.g. 256×256), and should have at least 14 frames. Example: “An array of shape [100, 90, 3, 256, 256] representing 100 RGB videos, with 90 frames each.”
return_metadata	Type: bool Required: No Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes. Example: True
show_progress	Type: bool Required: No Description: Whether to show a progress bar during encoding (for large batches). Example: True

Parameters used in `get_model_metadata`

This function loads the encoding model’s metadata without having to load the model itself.

model_id	Type: str Required: Yes Description: Unique identifier of the model to load. Valid Values: fmri-bmd-s3d Example: “fmri-bmd-s3d”
subject	Type: int Required: Yes Description: Subject ID from the BMD dataset (1-10). Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Example: 1

Performance

Accuracy Plots (AWS directory):

brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-bmd/model-s3d/encoding_models_accuracy

Example Usage

from berg import BERG

# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")

# Load the model
model = berg.get_encoding_model(
    "fmri-bmd-s3d",
    subject=1,
)

# Prepare the stimulus videos
# Video shape should be [batch_size, video_frames, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 90, 3, 256, 256))

# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
    model,
    stimulus,
    show_progress=True
)

# The in silico fMRI responses will be a numpy.ndarray of shape:
# ['batch_size', 'n_voxels']
# where:
# - n_voxels: Number of selected voxels for which the in silico fMRI responses are generated.

# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
    model,
    stimulus,
    return_metadata=True
)

# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
    "fmri-bmd-s3d",
    subject=1
)

References

Model building code: https://github.com/gifale95/BERG/tree/main/berg_creation_code
BMD paper (Lahner et al., 2024): https://doi.org/10.1038/s41593-021-00962-x
Memento 10k dataset (Newman et al., 2020): https://link.springer.com/chapter/10.1007/978-3-030-58517-4_14
s3d (Xie et al., 2017): https://doi.org/10.48550/arXiv.1712.04851