fmri-nsd_fsaverage-huze
Model Summary
Modality |
fMRI |
|---|---|
Training Dataset |
Natural Scenes Dataset (NSD) (fsaverage surface space) |
Species |
Human |
Stimuli |
Images |
Model Type |
Vision transformer (DINOv2) |
Creator |
Huzheng Yang |
Description
The encoding model is based on a vision transformer (DINOv2) finetuned on LoRA. During training, the encoding model learned which DINOv2 layers and which spatial location of layer features to use to best predict each fMRI vertex. While this feature selection was learned for each vertex, nearby vertices were constrained to select similar features. Finally, the activity of each vertex is linearly predicted from the selected DINOv2 features.
The encoding model training pipeline consisted of two steps. The goal of step 1 was to generate a noiseless version of fMRI responses, later used in step 2. This was achieved by training a single encoding model that predicted the fMRI responses of all 8 NSD subjects, by using as input the stimulus images, experimental design information (i.e., the stimulus presentation order), and behavioral data. This model was then used to generate in silico fMRI responses for all NSD images. In step 2 encoding models were trained, individually for each NSD subject, using as target data both the original fMRI responses of each subject and the in silico fMRI responses generated in step 1, and as input only the stimulus images.
The encoding models were trained on the Natural Scenes Dataset (NSD) (Allen et al., 2022), 7T fMRI responses of 8 subjects to 73k natural scenes coming from the COCO dataset (Lin et al., 2014).
Preprocessing. The encoding models are trained on NSD’s data prepared in FreeSurfer’s fsaverage space, from the “betas_fithrf_GLMdenoise_RR” preprocessing version. Note that the NSD data were z-scored at each scan session, and as a consequence the in silico fMRI responses generated by the encoding models also live in z-scored space.
Model training partition. fMRI responses for up to 9,000 non-shared images (i.e., the images uniquely seen by each subject during the NSD experiment).
Model validation partition. fMRI responses for up to 485/1,000 shared images (i.e., the 485 shared images that not all subjects saw for up to three times during the NSD experiment).
Model testing partition. fMRI responses for 515/1,000 shared images (i.e., the 515 images that each subject saw for exactly three times during the NSD experiment). The models are additionally tested out-of-distribution on NSD-synthetic, the out-of-distribution component of NSD consisting of fMRI responses from the same 8 NSD subjects to 286 NSD-synthetic images.
Metadata
fmri
lh_ncsnr :
(163842,)- Left hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-core)rh_ncsnr :
(163842,)- Right hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-core)lh_ncsnr_nsdsynthetic :
(163842,)- Left hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-synthetic)rh_ncsnr_nsdsynthetic :
(163842,)- Right hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-synthetic)
- lh_fsaverage_rois
dict- Left hemisphere ROI definitions on fsaverage surfaceV1v :
(710,)- Visual area 1 ventralV1d :
(828,)- Visual area 1 dorsalV2v :
(632,)- Visual area 2 ventralV2d :
(692,)- Visual area 2 dorsalV3v :
(567,)- Visual area 3 ventralV3d :
(669,)- Visual area 3 dorsalhV4 :
(531,)- Human V4 complexEBA :
(3231,)- Extrastriate body areaFBA-1 :
(574,)- Fusiform body area 1FBA-2 :
(0,)- Fusiform body area 2mTL-bodies :
(0,)- Medial temporal lobe body-selective regionOFA :
(432,)- Occipital face areaFFA-1 :
(552,)- Fusiform face area 1FFA-2 :
(0,)- Fusiform face area 2mTL-faces :
(0,)- Medial temporal lobe face-selective regionaTL-faces :
(329,)- Anterior temporal lobe face-selective regionOPA :
(2021,)- Occipital place areaPPA :
(1859,)- Parahippocampal place areaRSC :
(1298,)- Retrosplenial complexOWFA :
(317,)- Occipital word form areaVWFA-1 :
(1395,)- Visual word form area 1VWFA-2 :
(474,)- Visual word form area 2mfs-words :
(490,)- Mid-fusiform sulcus word-selective regionmTL-words :
(475,)- Medial temporal lobe word-selective regionearly :
(5758,)- Early visual cortex (V1-V3)midventral :
(867,)- Mid-level ventral streammidlateral :
(1091,)- Mid-level lateral streammidparietal :
(1079,)- Mid-level parietal regionsventral :
(9680,)- Ventral visual streamlateral :
(10253,)- Lateral visual streamparietal :
(5176,)- Parietal regionsnsdgeneral :
(18461,)- NSD general visual cortex mask- rh_fsaverage_rois
dict- Right hemisphere ROI definitions on fsaverage surfaceV1v :
(444,)- Visual area 1 ventralV1d :
(991,)- Visual area 1 dorsalV2v :
(887,)- Visual area 2 ventralV2d :
(725,)- Visual area 2 dorsalV3v :
(682,)- Visual area 3 ventralV3d :
(535,)- Visual area 3 dorsalhV4 :
(765,)- Human V4 complexEBA :
(4421,)- Extrastriate body areaFBA-1 :
(206,)- Fusiform body area 1FBA-2 :
(1234,)- Fusiform body area 2mTL-bodies :
(0,)- Medial temporal lobe body-selective regionOFA :
(305,)- Occipital face areaFFA-1 :
(330,)- Fusiform face area 1FFA-2 :
(1003,)- Fusiform face area 2mTL-faces :
(0,)- Medial temporal lobe face-selective regionaTL-faces :
(283,)- Anterior temporal lobe face-selective regionOPA :
(2849,)- Occipital place areaPPA :
(1250,)- Parahippocampal place areaRSC :
(1136,)- Retrosplenial complexOWFA :
(590,)- Occipital word form areaVWFA-1 :
(397,)- Visual word form area 1VWFA-2 :
(649,)- Visual word form area 2mfs-words :
(0,)- Mid-fusiform sulcus word-selective regionmTL-words :
(0,)- Medial temporal lobe word-selective regionearly :
(5634,)- Early visual cortex (V1-V3)midventral :
(1050,)- Mid-level ventral streammidlateral :
(1191,)- Mid-level lateral streammidparietal :
(1181,)- Mid-level parietal regionsventral :
(9393,)- Ventral visual streamlateral :
(10535,)- Lateral visual streamparietal :
(4818,)- Parietal regionsnsdgeneral :
(19523,)- NSD general visual cortex mask
encoding_models
train_img_num :
(9000,)- Image indices used for trainingval_img_num :
(485,)- Image indices used for validationtest_img_num :
(515,)- Image indices used for testinglh_correlation_nsdcore :
(163842,)- Left hemisphere correlation scores (NSD core)rh_correlation_nsdcore :
(163842,)- Right hemisphere correlation scores (NSD core)lh_r2_nsdcore :
(163842,)- Left hemisphere R² scores (NSD core)rh_r2_nsdcore :
(163842,)- Right hemisphere R² scores (NSD core)lh_noise_ceiling_nsdcore :
(163842,)- Left hemisphere noise ceiling (NSD core)rh_noise_ceiling_nsdcore :
(163842,)- Right hemisphere noise ceiling (NSD core)lh_explained_variance_nsdcore :
(163842,)- Left hemisphere % explained variance (NSD core)rh_explained_variance_nsdcore :
(163842,)- Right hemisphere % explained variance (NSD core)lh_correlation_nsdsynthetic :
(163842,)- Left hemisphere correlation scores (NSD synthetic)rh_correlation_nsdsynthetic :
(163842,)- Right hemisphere correlation scores (NSD synthetic)lh_r2_nsdsynthetic :
(163842,)- Left hemisphere R² scores (NSD synthetic)rh_r2_nsdsynthetic :
(163842,)- Right hemisphere R² scores (NSD synthetic)lh_noise_ceiling_nsdsynthetic :
(163842,)- Left hemisphere noise ceiling (NSD synthetic)rh_noise_ceiling_nsdsynthetic :
(163842,)- Right hemisphere noise ceiling (NSD synthetic)lh_explained_variance_nsdsynthetic :
(163842,)- Left hemisphere % explained variance (NSD synthetic)rh_explained_variance_nsdsynthetic :
(163842,)- Right hemisphere % explained variance (NSD synthetic)
Input
Type |
|
|---|---|
Shape |
|
Description |
The input should be a batch of RGB images. |
Constraints |
|
Output
Type |
|
|---|---|
Shape |
|
Description |
The output is a tuple containing the left hemisphere (LH) and right hemisphere (RH) in silico fMRI
responses for the batch images.
|
Dimensions |
batch_size: Number of stimuli in the batch.
lh_vertices: Number of selected LH vertices for which the in silico fMRI responses are generated.
rh_vertices: Number of selected RH vertices for which the in silico fMRI responses are generated.
|
Parameters
Parameters used in get_encoding_model
This function loads the encoding model.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-nsd_fsaverage-huze
Example: “fmri-nsd_fsaverage-huze”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the NSD dataset (1-8).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8
Example: 1
|
selection |
Type: dict
Required: No
Description: Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for all LH and RH fMRI vertices.
Properties:
roi
Type: str
Description: The region-of-interest (ROI) for which the in silico fMRI responses (of both
hemispherese) are generated.
Valid values: “V1d”, “V1v”, “V2d”, “V2v”, “V3d”, “V3v”, “hV4”, “OFA”, “FFA-1”, “FFA-2”, “mTL-faces”, “aTL-faces”, “OVWFA”, “VWFA-1”, “VWFA-2”, “mfs-words”, “mTL-words”, “OPA”, “PPA”, “RSC”, “EBA”, “FBA-1”, “FBA-2”, “mTL-bodies”, “early”, “midventral”, “midlateral”, “midparietal”, “parietal”, “lateral”, “ventral”, “nsdgeneral”
lh_vertices
Type: numpy.ndarray
Description: Binary one-hot encoded vector with ones indicating the left hemisphere (LH)
vertices for which the in silico fMRI responses are generated. This vector must
have exactly the same length as the number of LH fsaverage vertices (163,842).
The vertices from the one-hot encoded vector are only selected if the “roi” key
is not provided, or has value None.
rh_vertices
Type: numpy.ndarray
Description: Binary one-hot encoded vector with ones indicating the right hemisphere (RH)
vertices for which the in silico fMRI responses are generated. This vector must
have exactly the same length as the number of RH fsaverage vertices (163,842).
The vertices from the one-hot encoded vector are only selected if the “roi” key
is not provided, or has value None.
|
device |
Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”
|
Parameters used in encode
This function generates in silico neural responses using the encoding model previously loaded.
model |
Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.
|
stimulus |
Type: numpy.ndarray
Required: Yes
Description: A batch of RGB images to be encoded. Images should be in integer format with values in the range [0, 255], and square dimensions (e.g. 224×224).
Example: “An array of shape [100, 3, 224, 224] representing 100 RGB images.”
|
return_metadata |
Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes.
Example: True
|
show_progress |
Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True
|
Parameters used in get_model_metadata
This function loads the encoding model’s metadata without having to load the model itself.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-nsd_fsaverage-huze
Example: “fmri-nsd_fsaverage-huze”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the NSD dataset (1-8).
Valid Values: 1, 2, 3, 4, 5, 6, 7, 8
Example: 1
|
Performance
Accuracy Plots (AWS directory):
brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-nsd_fsaverage/model-huze/encoding_models_accuracy
Example Usage
from berg import BERG
# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")
# Load the model
model = berg.get_encoding_model(
"fmri-nsd_fsaverage-huze",
subject=1,
)
# Prepare the stimulus images
# Image shape should be [batch_size, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 3, 256, 256))
# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
model,
stimulus,
show_progress=True
)
# The in silico fMRI responses will be a tuple of numpy.ndarray of shape:
# ([batch_size, lh_vertices], [batch_size, rh_vertices])
# where:
# - lh_vertices is the number of selected left hemisphere (LH) vertices for which the in silico
# fMRI responses are generated.
# - rh_vertices is the number of selected right hemisphere (RH) vertices for which the in silico
# fMRI responses are generated.
# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
model,
stimulus,
return_metadata=True
)
# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
"fmri-nsd_fsaverage-huze",
subject=1
)
References
Model video: https://youtu.be/Qh49zQQCW1g
Model slides: https://penno365-my.sharepoint.com/:p:/g/personal/huze_upenn_edu/EVDLndCXy21LpKEelu_MVkMBK9dbFIhlI6VEQzOl4j6eLA?e=eED63x
Model building code: https://huggingface.co/huzey/nsd_model/tree/main
NSD paper (Allen et al., 2022): https://doi.org/10.1038/s41593-021-00962-x
NSD-synthetic paper (Gifford et al., 2025): https://doi.org/10.48550/arXiv.2503.06286
COCO dataset (Lin et al., 2014): https://cocodataset.org/#home
DINOv2: https://huggingface.co/docs/transformers/en/model_doc/dinov2