fmri-bmd-s3d
Model Summary
Modality |
fMRI |
|---|---|
Training Dataset |
BOLD Moments Dataset (BMD) (MNI152 volume space) |
Species |
Human |
Stimuli |
3 second videos |
Model Type |
3D CNN model (s3d) |
Creator |
Alessandro Gifford |
Description
These encoding models consist in a linear mapping (through linear regression) of video CNN (Xie et al., 2017) image features onto fMRI responses. Prior to mapping onto fMRI responses, the video features have been downsampled to 100 principal components using principal component analysis.
The encoding models were trained on the BOLD Moments Dataset (BMD) (Lahner et al., 2024), fMRI responses of 10 subjects to 1102 3-second naturalistic movies coming from the Memento10k dataset dataset (Newman et al., 2020). One encoding model was trained for each BMD subject, and for each fMRI vertex.
Preprocessing. The encoding models are trained on BMD’s data prepared in MNI152 volume space, from the “versionB” preprocessing version. Note that the BMD data were z-scored at each scan session, and as a consequence the in silico fMRI responses generated by the encoding models also live in z-scored space.
Model training partition. fMRI responses for 1000 videos.
Model testing partition. fMRI responses for 102 videos.
ROIs. Each ROI in the metadata consists of a tuple with 3 items: (1) The ROI voxel indices in 3-dimensional brain volume space; (2) The ROI voxel indices in 1-dimensional flattened brain volume space; (3) the NIfTI images of the ROI voxel indices.
Metadata
fmri
group_mask :
(62, 77, 61)- Whole brain group mask (voxels defined for all subjects)group_mask_header :
object- Whole brain group mask headergroup_mask_affine :
(4, 4)- Whole brain group mask affinesub_mask :
(62, 77, 61)- Whole brain subject masksub_mask_header :
object- Whole brain subject mask headersub_mask_affine :
(4, 4)- Whole brain subject mask affine
- rois
dict- ROI voxel indices (l = left hemisphere, r = right hemisphere)lV1v :
tuple- Visual area 1 ventral (LH)rV1v :
tuple- Visual area 1 ventral (RH)lV1d :
tuple- Visual area 1 dorsal (LH)rV1d :
tuple- Visual area 1 dorsal (RH)lV2v :
tuple- Visual area 2 ventral (LH)rV2v :
tuple- Visual area 2 ventral (RH)lV2d :
tuple- Visual area 2 dorsal (LH)rV2d :
tuple- Visual area 2 dorsal (RH)lV3v :
tuple- Visual area 3 ventral (LH)rV3v :
tuple- Visual area 3 ventral (RH)lV3d :
tuple- Visual area 3 dorsal (LH)rV3d :
tuple- Visual area 3 dorsal (RH)lV3ab :
tuple- Visual areas 3a and 3b (LH)rV3ab :
tuple- Visual areas 3a and 3b (RH)lhV4 :
tuple- Human V4 complex (LH)rhV4 :
tuple- Human V4 complex (RH)lFFA :
tuple- Fusiform face area (LH)rFFA :
tuple- Fusiform face area (RH)lOFA :
tuple- Occipital face area (LH)rOFA :
tuple- Occipital face area (RH)lEBA :
tuple- Extrastriate body area (LH)rEBA :
tuple- Extrastriate body area (RH)lLOC :
tuple- Lateral occipital complex (LH)rLOC :
tuple- Lateral occipital complex (RH)lPPA :
tuple- Parahippocampal place area (LH)rPPA :
tuple- Parahippocampal place area (RH)lRSC :
tuple- Retrosplenial cortex (LH)rRSC :
tuple- Retrosplenial cortex (RH)lSTS :
tuple- Superior temporal sulcus (LH)rSTS :
tuple- Superior temporal sulcus (RH)lTOS :
tuple- Temporal occipital sulcus (LH)rTOS :
tuple- Temporal occipital sulcus (RH)lMT :
tuple- Middle temporal area (LH)rMT :
tuple- Middle temporal area (RH)l7AL :
tuple- Dorsal intraparietal area 7AL (LH)r7AL :
tuple- Dorsal intraparietal area 7AL (RH)lIPS0 :
tuple- Intraparietal area IPS0 (LH)rIPS0 :
tuple- Intraparietal area IPS0 (RH)lIPS1-2-3 :
tuple- Intraparietal areas IPS1, IPS2, and IPS3 (LH)rIPS1-2-3 :
tuple- Intraparietal areas IPS1, IPS2, and IPS3 (RH)lPFt :
tuple- Inferior parietal area PFt (LH)rPFt :
tuple- Inferior parietal area PFt (RH)lBA2 :
tuple- Brodmann area 2 (LH)rBA2 :
tuple- Brodmann area 2 (RH)lPFop :
tuple- Inferior parietal area PFop (LH)rPFop :
tuple- Inferior parietal area PFop (RH)BMDgeneral :
tuple- BMD general visual cortex mask
encoding_models
noiseceiling_task_train_n_1 :
(N,)- Voxelwise noise ceiling computed on single train data repeatsnoiseceiling_task_train_n_3 :
(N,)- Voxelwise noise ceiling computed on all train data repeatsnoiseceiling_task_test_n_1 :
(N,)- Voxelwise noise ceiling computed on single test data repeatsnoiseceiling_task_test_n_10 :
(N,)- Voxelwise noise ceiling computed on all test data repeatscorrelation :
(N,)- Correlation prediction accuracyr2 :
(N,)- Explained variance (R² prediction accuracy)explained_variance :
(N,)- Noise-ceiling-normalized explained variance
Input
Type |
|
|---|---|
Shape |
|
Description |
The input should be a batch of RGB video frames. While the model takes an input videos of any duration, we recommend using ~3-second videos to match the duration of the videos used to train the encoding models. The videos should have at least 14 frames. |
Constraints |
|
Output
Type |
|
|---|---|
Shape |
|
Description |
The output is a 2D array containing in silico fMRI responses. |
Dimensions:
Name |
Description |
|---|---|
batch_size |
Number of stimulus videos in the batch. |
n_voxels |
Number of selected voxels for which the in silico fMRI responses are generated. |
Parameters
Parameters used in get_encoding_model
This function loads the encoding model.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-bmd-s3d
Example: “fmri-bmd-s3d”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the BMD dataset (1-10).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Example: 1
|
selection |
Type: dict
Required: No
Description: Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for whole brain voxels.
Properties:
roi
Type: str
Description: The region-of-interest (ROI) for which the in silico fMRI responses are generated.
Valid values: “lV1v”, “rV1v”, “lV1d”, “rV1d”, “lV2v”, “rV2v”, “lV2d”, “rV2d”, “lV3v”, “rV3v”, “lV3d”, “rV3d”, “lV3ab”, “rV3ab”, “lhV4”, “rhV4”, “lFFA”, “rFFA”, “lOFA”, “rOFA”, “lEBA”, “rEBA”, “lLOC”, “rLOC”, “lPPA”, “rPPA”, “lRSC”, “rRSC”, “rSTS”, “lSTS”, “lTOS”, “rTOS”, “lMT”, “rMT”, “l7AL”, “r7AL”, “lIPS0”, “rIPS0”, “rIPS1-2-3”, “lIPS1-2-3”, “lPFt”, “rPFt”, “lBA2”, “rBA2”, “lPFop”, “rPFop”, “BMDgeneral”
voxels
Type: numpy.ndarray
Description: Binary one-hot encoded vector with ones indicating the voxels for which the in
silico fMRI responses are generated. This vector must have exactly the same
length as the number of voxels, which varies for each subject:
- Subject 1: 108,219 voxels
- Subject 2: 108,603 voxels
- Subject 3: 108,366 voxels
- Subject 4: 108,283 voxels
- Subject 5: 108,201 voxels
- Subject 6: 108,449 voxels
- Subject 7: 108,126 voxels
- Subject 8: 108,407 voxels
- Subject 9: 108,250 voxels
- Subject 10: 107,987 voxels
The voxels from the one-hot encoded vector are only selected if the “roi” key
is not provided, or has value None.
|
device |
Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”
|
Parameters used in encode
This function generates in silico neural responses using the encoding model previously loaded.
model |
Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.
|
stimulus |
Type: numpy.ndarray
Required: Yes
Description: A batch of RGB videos to be encoded. Videos should be in integer format with values in the range [0, 255], of square dimensions (e.g. 256×256), and should have at least 14 frames.
Example: “An array of shape [100, 90, 3, 256, 256] representing 100 RGB videos, with 90 frames each.”
|
return_metadata |
Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes.
Example: True
|
show_progress |
Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True
|
Parameters used in get_model_metadata
This function loads the encoding model’s metadata without having to load the model itself.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-bmd-s3d
Example: “fmri-bmd-s3d”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the BMD dataset (1-10).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Example: 1
|
Performance
Accuracy Plots (AWS directory):
brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-bmd/model-s3d/encoding_models_accuracy
Example Usage
from berg import BERG
# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")
# Load the model
model = berg.get_encoding_model(
"fmri-bmd-s3d",
subject=1,
)
# Prepare the stimulus videos
# Video shape should be [batch_size, video_frames, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 90, 3, 256, 256))
# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
model,
stimulus,
show_progress=True
)
# The in silico fMRI responses will be a numpy.ndarray of shape:
# ['batch_size', 'n_voxels']
# where:
# - n_voxels: Number of selected voxels for which the in silico fMRI responses are generated.
# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
model,
stimulus,
return_metadata=True
)
# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
"fmri-bmd-s3d",
subject=1
)
References
Model building code: https://github.com/gifale95/BERG/tree/main/berg_creation_code
BMD paper (Lahner et al., 2024): https://doi.org/10.1038/s41593-021-00962-x
Memento 10k dataset (Newman et al., 2020): https://link.springer.com/chapter/10.1007/978-3-030-58517-4_14
s3d (Xie et al., 2017): https://doi.org/10.48550/arXiv.1712.04851