fmri-bmd-s3d

Model Summary

Modality

fMRI

Training Dataset

BOLD Moments Dataset (BMD) (MNI152 volume space)

Species

Human

Stimuli

3 second videos

Model Type

3D CNN model (s3d)

Creator

Alessandro Gifford

Description

These encoding models consist in a linear mapping (through linear regression) of video CNN (Xie et al., 2017) image features onto fMRI responses. Prior to mapping onto fMRI responses, the video features have been downsampled to 100 principal components using principal component analysis.

The encoding models were trained on the BOLD Moments Dataset (BMD) (Lahner et al., 2024), fMRI responses of 10 subjects to 1102 3-second naturalistic movies coming from the Memento10k dataset dataset (Newman et al., 2020). One encoding model was trained for each BMD subject, and for each fMRI vertex.

Preprocessing. The encoding models are trained on BMD’s data prepared in MNI152 volume space, from the “versionB” preprocessing version. Note that the BMD data were z-scored at each scan session, and as a consequence the in silico fMRI responses generated by the encoding models also live in z-scored space.

Model training partition. fMRI responses for 1000 videos.

Model testing partition. fMRI responses for 102 videos.

ROIs. Each ROI in the metadata consists of a tuple with 3 items: (1) The ROI voxel indices in 3-dimensional brain volume space; (2) The ROI voxel indices in 1-dimensional flattened brain volume space; (3) the NIfTI images of the ROI voxel indices.

Metadata

fmri

group_mask : (62, 77, 61) - Whole brain group mask (voxels defined for all subjects)

group_mask_header : object - Whole brain group mask header

group_mask_affine : (4, 4) - Whole brain group mask affine

sub_mask : (62, 77, 61) - Whole brain subject mask

sub_mask_header : object - Whole brain subject mask header

sub_mask_affine : (4, 4) - Whole brain subject mask affine

roisdict - ROI voxel indices (l = left hemisphere, r = right hemisphere)

lV1v : tuple - Visual area 1 ventral (LH)

rV1v : tuple - Visual area 1 ventral (RH)

lV1d : tuple - Visual area 1 dorsal (LH)

rV1d : tuple - Visual area 1 dorsal (RH)

lV2v : tuple - Visual area 2 ventral (LH)

rV2v : tuple - Visual area 2 ventral (RH)

lV2d : tuple - Visual area 2 dorsal (LH)

rV2d : tuple - Visual area 2 dorsal (RH)

lV3v : tuple - Visual area 3 ventral (LH)

rV3v : tuple - Visual area 3 ventral (RH)

lV3d : tuple - Visual area 3 dorsal (LH)

rV3d : tuple - Visual area 3 dorsal (RH)

lV3ab : tuple - Visual areas 3a and 3b (LH)

rV3ab : tuple - Visual areas 3a and 3b (RH)

lhV4 : tuple - Human V4 complex (LH)

rhV4 : tuple - Human V4 complex (RH)

lFFA : tuple - Fusiform face area (LH)

rFFA : tuple - Fusiform face area (RH)

lOFA : tuple - Occipital face area (LH)

rOFA : tuple - Occipital face area (RH)

lEBA : tuple - Extrastriate body area (LH)

rEBA : tuple - Extrastriate body area (RH)

lLOC : tuple - Lateral occipital complex (LH)

rLOC : tuple - Lateral occipital complex (RH)

lPPA : tuple - Parahippocampal place area (LH)

rPPA : tuple - Parahippocampal place area (RH)

lRSC : tuple - Retrosplenial cortex (LH)

rRSC : tuple - Retrosplenial cortex (RH)

lSTS : tuple - Superior temporal sulcus (LH)

rSTS : tuple - Superior temporal sulcus (RH)

lTOS : tuple - Temporal occipital sulcus (LH)

rTOS : tuple - Temporal occipital sulcus (RH)

lMT : tuple - Middle temporal area (LH)

rMT : tuple - Middle temporal area (RH)

l7AL : tuple - Dorsal intraparietal area 7AL (LH)

r7AL : tuple - Dorsal intraparietal area 7AL (RH)

lIPS0 : tuple - Intraparietal area IPS0 (LH)

rIPS0 : tuple - Intraparietal area IPS0 (RH)

lIPS1-2-3 : tuple - Intraparietal areas IPS1, IPS2, and IPS3 (LH)

rIPS1-2-3 : tuple - Intraparietal areas IPS1, IPS2, and IPS3 (RH)

lPFt : tuple - Inferior parietal area PFt (LH)

rPFt : tuple - Inferior parietal area PFt (RH)

lBA2 : tuple - Brodmann area 2 (LH)

rBA2 : tuple - Brodmann area 2 (RH)

lPFop : tuple - Inferior parietal area PFop (LH)

rPFop : tuple - Inferior parietal area PFop (RH)

BMDgeneral : tuple - BMD general visual cortex mask

encoding_models

noiseceiling_task_train_n_1 : (N,) - Voxelwise noise ceiling computed on single train data repeats

noiseceiling_task_train_n_3 : (N,) - Voxelwise noise ceiling computed on all train data repeats

noiseceiling_task_test_n_1 : (N,) - Voxelwise noise ceiling computed on single test data repeats

noiseceiling_task_test_n_10 : (N,) - Voxelwise noise ceiling computed on all test data repeats

correlation : (N,) - Correlation prediction accuracy

r2 : (N,) - Explained variance (R² prediction accuracy)

explained_variance : (N,) - Noise-ceiling-normalized explained variance

Input

Type

numpy.ndarray

Shape

['batch_size', 'video_frames', '3_channels', 'height', 'width']

Description

The input should be a batch of RGB video frames. While the model takes an input videos of any duration, we recommend using ~3-second videos to match the duration of the videos used to train the encoding models. The videos should have at least 14 frames.

Constraints

  • Image values should be integers in range [0, 255].

  • Image dimensions (height, width) should be equal (square).

  • Minimum recommended video size: 256×256 pixels.

Output

Type

numpy.ndarray

Shape

['batch_size', 'n_voxels']

Description

The output is a 2D array containing in silico fMRI responses.

Dimensions:

Name

Description

batch_size

Number of stimulus videos in the batch.

n_voxels

Number of selected voxels for which the in silico fMRI responses are generated.

Parameters

Parameters used in get_encoding_model

This function loads the encoding model.

model_id

Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-bmd-s3d
Example: “fmri-bmd-s3d”

subject

Type: int
Required: Yes
Description: Subject ID from the BMD dataset (1-10).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Example: 1

selection

Type: dict
Required: No
Description: Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for whole brain voxels.

Properties:

roi
Type: str
Description: The region-of-interest (ROI) for which the in silico fMRI responses are generated.
Valid values: “lV1v”, “rV1v”, “lV1d”, “rV1d”, “lV2v”, “rV2v”, “lV2d”, “rV2d”, “lV3v”, “rV3v”, “lV3d”, “rV3d”, “lV3ab”, “rV3ab”, “lhV4”, “rhV4”, “lFFA”, “rFFA”, “lOFA”, “rOFA”, “lEBA”, “rEBA”, “lLOC”, “rLOC”, “lPPA”, “rPPA”, “lRSC”, “rRSC”, “rSTS”, “lSTS”, “lTOS”, “rTOS”, “lMT”, “rMT”, “l7AL”, “r7AL”, “lIPS0”, “rIPS0”, “rIPS1-2-3”, “lIPS1-2-3”, “lPFt”, “rPFt”, “lBA2”, “rBA2”, “lPFop”, “rPFop”, “BMDgeneral”

voxels
Type: numpy.ndarray
Description: Binary one-hot encoded vector with ones indicating the voxels for which the in
silico fMRI responses are generated. This vector must have exactly the same
length as the number of voxels, which varies for each subject:
- Subject 1: 108,219 voxels
- Subject 2: 108,603 voxels
- Subject 3: 108,366 voxels
- Subject 4: 108,283 voxels
- Subject 5: 108,201 voxels
- Subject 6: 108,449 voxels
- Subject 7: 108,126 voxels
- Subject 8: 108,407 voxels
- Subject 9: 108,250 voxels
- Subject 10: 107,987 voxels
The voxels from the one-hot encoded vector are only selected if the “roi” key
is not provided, or has value None.

device

Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”

Parameters used in encode

This function generates in silico neural responses using the encoding model previously loaded.

model

Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.

stimulus

Type: numpy.ndarray
Required: Yes
Description: A batch of RGB videos to be encoded. Videos should be in integer format with values in the range [0, 255], of square dimensions (e.g. 256×256), and should have at least 14 frames.
Example: “An array of shape [100, 90, 3, 256, 256] representing 100 RGB videos, with 90 frames each.”

return_metadata

Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes.
Example: True

show_progress

Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True

Parameters used in get_model_metadata

This function loads the encoding model’s metadata without having to load the model itself.

model_id

Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-bmd-s3d
Example: “fmri-bmd-s3d”

subject

Type: int
Required: Yes
Description: Subject ID from the BMD dataset (1-10).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Example: 1

Performance

Accuracy Plots (AWS directory):

  • brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-bmd/model-s3d/encoding_models_accuracy

Example Usage

from berg import BERG

# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")

# Load the model
model = berg.get_encoding_model(
    "fmri-bmd-s3d",
    subject=1,
)

# Prepare the stimulus videos
# Video shape should be [batch_size, video_frames, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 90, 3, 256, 256))

# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
    model,
    stimulus,
    show_progress=True
)

# The in silico fMRI responses will be a numpy.ndarray of shape:
# ['batch_size', 'n_voxels']
# where:
# - n_voxels: Number of selected voxels for which the in silico fMRI responses are generated.

# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
    model,
    stimulus,
    return_metadata=True
)

# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
    "fmri-bmd-s3d",
    subject=1
)

References