fmri-nsd_fsaverage-huze

Model Summary

Modality	fMRI
Training Dataset	Natural Scenes Dataset (NSD) (fsaverage surface space)
Species	Human
Stimuli	Images
Model Type	Vision transformer (DINOv2)
Creator	Huzheng Yang

Description

The encoding model is based on a vision transformer (DINOv2) finetuned on LoRA. During training, the encoding model learned which DINOv2 layers and which spatial location of layer features to use to best predict each fMRI vertex. While this feature selection was learned for each vertex, nearby vertices were constrained to select similar features. Finally, the activity of each vertex is linearly predicted from the selected DINOv2 features.

The encoding model training pipeline consisted of two steps. The goal of step 1 was to generate a noiseless version of fMRI responses, later used in step 2. This was achieved by training a single encoding model that predicted the fMRI responses of all 8 NSD subjects, by using as input the stimulus images, experimental design information (i.e., the stimulus presentation order), and behavioral data. This model was then used to generate in silico fMRI responses for all NSD images. In step 2 encoding models were trained, individually for each NSD subject, using as target data both the original fMRI responses of each subject and the in silico fMRI responses generated in step 1, and as input only the stimulus images.

The encoding models were trained on the Natural Scenes Dataset (NSD) (Allen et al., 2022), 7T fMRI responses of 8 subjects to 73k natural scenes coming from the COCO dataset (Lin et al., 2014).

Preprocessing. The encoding models are trained on NSD’s data prepared in FreeSurfer’s fsaverage space, from the “betas_fithrf_GLMdenoise_RR” preprocessing version. Note that the NSD data were z-scored at each scan session, and as a consequence the in silico fMRI responses generated by the encoding models also live in z-scored space.

Model training partition. fMRI responses for up to 9,000 non-shared images (i.e., the images uniquely seen by each subject during the NSD experiment).

Model validation partition. fMRI responses for up to 485/1,000 shared images (i.e., the 485 shared images that not all subjects saw for up to three times during the NSD experiment).

Model testing partition. fMRI responses for 515/1,000 shared images (i.e., the 515 images that each subject saw for exactly three times during the NSD experiment). The models are additionally tested out-of-distribution on NSD-synthetic, the out-of-distribution component of NSD consisting of fMRI responses from the same 8 NSD subjects to 286 NSD-synthetic images.

Metadata

fmri

lh_ncsnr : (163842,) - Left hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-core)

rh_ncsnr : (163842,) - Right hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-core)

lh_ncsnr_nsdsynthetic : (163842,) - Left hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-synthetic)

rh_ncsnr_nsdsynthetic : (163842,) - Right hemisphere noise ceiling signal-to-noise ratio per vertex (computed on NSD-synthetic)

lh_fsaverage_roisdict - Left hemisphere ROI definitions on fsaverage surface
V1v : (710,) - Visual area 1 ventral

V1d : (828,) - Visual area 1 dorsal

V2v : (632,) - Visual area 2 ventral

V2d : (692,) - Visual area 2 dorsal

V3v : (567,) - Visual area 3 ventral

V3d : (669,) - Visual area 3 dorsal

hV4 : (531,) - Human V4 complex

EBA : (3231,) - Extrastriate body area

FBA-1 : (574,) - Fusiform body area 1

FBA-2 : (0,) - Fusiform body area 2

mTL-bodies : (0,) - Medial temporal lobe body-selective region

OFA : (432,) - Occipital face area

FFA-1 : (552,) - Fusiform face area 1

FFA-2 : (0,) - Fusiform face area 2

mTL-faces : (0,) - Medial temporal lobe face-selective region

aTL-faces : (329,) - Anterior temporal lobe face-selective region

OPA : (2021,) - Occipital place area

PPA : (1859,) - Parahippocampal place area

RSC : (1298,) - Retrosplenial complex

OWFA : (317,) - Occipital word form area

VWFA-1 : (1395,) - Visual word form area 1

VWFA-2 : (474,) - Visual word form area 2

mfs-words : (490,) - Mid-fusiform sulcus word-selective region

mTL-words : (475,) - Medial temporal lobe word-selective region

early : (5758,) - Early visual cortex (V1-V3)

midventral : (867,) - Mid-level ventral stream

midlateral : (1091,) - Mid-level lateral stream

midparietal : (1079,) - Mid-level parietal regions

ventral : (9680,) - Ventral visual stream

lateral : (10253,) - Lateral visual stream

parietal : (5176,) - Parietal regions

nsdgeneral : (18461,) - NSD general visual cortex mask

rh_fsaverage_roisdict - Right hemisphere ROI definitions on fsaverage surface
V1v : (444,) - Visual area 1 ventral

V1d : (991,) - Visual area 1 dorsal

V2v : (887,) - Visual area 2 ventral

V2d : (725,) - Visual area 2 dorsal

V3v : (682,) - Visual area 3 ventral

V3d : (535,) - Visual area 3 dorsal

hV4 : (765,) - Human V4 complex

EBA : (4421,) - Extrastriate body area

FBA-1 : (206,) - Fusiform body area 1

FBA-2 : (1234,) - Fusiform body area 2

mTL-bodies : (0,) - Medial temporal lobe body-selective region

OFA : (305,) - Occipital face area

FFA-1 : (330,) - Fusiform face area 1

FFA-2 : (1003,) - Fusiform face area 2

mTL-faces : (0,) - Medial temporal lobe face-selective region

aTL-faces : (283,) - Anterior temporal lobe face-selective region

OPA : (2849,) - Occipital place area

PPA : (1250,) - Parahippocampal place area

RSC : (1136,) - Retrosplenial complex

OWFA : (590,) - Occipital word form area

VWFA-1 : (397,) - Visual word form area 1

VWFA-2 : (649,) - Visual word form area 2

mfs-words : (0,) - Mid-fusiform sulcus word-selective region

mTL-words : (0,) - Medial temporal lobe word-selective region

early : (5634,) - Early visual cortex (V1-V3)

midventral : (1050,) - Mid-level ventral stream

midlateral : (1191,) - Mid-level lateral stream

midparietal : (1181,) - Mid-level parietal regions

ventral : (9393,) - Ventral visual stream

lateral : (10535,) - Lateral visual stream

parietal : (4818,) - Parietal regions

nsdgeneral : (19523,) - NSD general visual cortex mask

encoding_models

train_img_num : (9000,) - Image indices used for training

val_img_num : (485,) - Image indices used for validation

test_img_num : (515,) - Image indices used for testing

lh_correlation_nsdcore : (163842,) - Left hemisphere correlation scores (NSD core)

rh_correlation_nsdcore : (163842,) - Right hemisphere correlation scores (NSD core)

lh_r2_nsdcore : (163842,) - Left hemisphere R² scores (NSD core)

rh_r2_nsdcore : (163842,) - Right hemisphere R² scores (NSD core)

lh_noise_ceiling_nsdcore : (163842,) - Left hemisphere noise ceiling (NSD core)

rh_noise_ceiling_nsdcore : (163842,) - Right hemisphere noise ceiling (NSD core)

lh_explained_variance_nsdcore : (163842,) - Left hemisphere % explained variance (NSD core)

rh_explained_variance_nsdcore : (163842,) - Right hemisphere % explained variance (NSD core)

lh_correlation_nsdsynthetic : (163842,) - Left hemisphere correlation scores (NSD synthetic)

rh_correlation_nsdsynthetic : (163842,) - Right hemisphere correlation scores (NSD synthetic)

lh_r2_nsdsynthetic : (163842,) - Left hemisphere R² scores (NSD synthetic)

rh_r2_nsdsynthetic : (163842,) - Right hemisphere R² scores (NSD synthetic)

lh_noise_ceiling_nsdsynthetic : (163842,) - Left hemisphere noise ceiling (NSD synthetic)

rh_noise_ceiling_nsdsynthetic : (163842,) - Right hemisphere noise ceiling (NSD synthetic)

lh_explained_variance_nsdsynthetic : (163842,) - Left hemisphere % explained variance (NSD synthetic)

rh_explained_variance_nsdsynthetic : (163842,) - Right hemisphere % explained variance (NSD synthetic)

Input

Type	`numpy.ndarray`
Shape	`['batch_size', 3, 'height', 'width']`
Description	The input should be a batch of RGB images.
Constraints	Image values should be integers in range [0, 255]. Image dimensions (height, width) should be equal (square). Minimum recommended image size: 224×224 pixels.

Output

Type	`tuple of numpy.ndarray`
Shape	`([batch_size, lh_vertices], [batch_size, rh_vertices])`
Description	The output is a tuple containing the left hemisphere (LH) and right hemisphere (RH) in silico fMRI responses for the batch images.
Dimensions	batch_size: Number of stimuli in the batch. lh_vertices: Number of selected LH vertices for which the in silico fMRI responses are generated. rh_vertices: Number of selected RH vertices for which the in silico fMRI responses are generated.

Parameters

Parameters used in `get_encoding_model`

This function loads the encoding model.

model_id	Type: str Required: Yes Description: Unique identifier of the model to load. Valid Values: fmri-nsd_fsaverage-huze Example: “fmri-nsd_fsaverage-huze”
subject	Type: int Required: Yes Description: Subject ID from the NSD dataset (1-8). Valid Values: 1, 2, 3, 4, 5, 6, 7, 8 Example: 1
selection	Type: dict Required: No Description: Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for all LH and RH fMRI vertices. Properties: roi Type: str Description: The region-of-interest (ROI) for which the in silico fMRI responses (of both hemispherese) are generated. Valid values: “V1d”, “V1v”, “V2d”, “V2v”, “V3d”, “V3v”, “hV4”, “OFA”, “FFA-1”, “FFA-2”, “mTL-faces”, “aTL-faces”, “OVWFA”, “VWFA-1”, “VWFA-2”, “mfs-words”, “mTL-words”, “OPA”, “PPA”, “RSC”, “EBA”, “FBA-1”, “FBA-2”, “mTL-bodies”, “early”, “midventral”, “midlateral”, “midparietal”, “parietal”, “lateral”, “ventral”, “nsdgeneral” lh_vertices Type: numpy.ndarray Description: Binary one-hot encoded vector with ones indicating the left hemisphere (LH) vertices for which the in silico fMRI responses are generated. This vector must have exactly the same length as the number of LH fsaverage vertices (163,842). The vertices from the one-hot encoded vector are only selected if the “roi” key is not provided, or has value None. rh_vertices Type: numpy.ndarray Description: Binary one-hot encoded vector with ones indicating the right hemisphere (RH) vertices for which the in silico fMRI responses are generated. This vector must have exactly the same length as the number of RH fsaverage vertices (163,842). The vertices from the one-hot encoded vector are only selected if the “roi” key is not provided, or has value None.
device	Type: str Required: No Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU. Valid Values: “cpu”, “cuda”, “auto” Example: “auto”

Parameters used in `encode`

This function generates in silico neural responses using the encoding model previously loaded.

model	Type: BaseModelInterface Required: Yes Description: An instantiated and loaded encoding model.
stimulus	Type: numpy.ndarray Required: Yes Description: A batch of RGB images to be encoded. Images should be in integer format with values in the range [0, 255], and square dimensions (e.g. 224×224). Example: “An array of shape [100, 3, 224, 224] representing 100 RGB images.”
return_metadata	Type: bool Required: No Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes. Example: True
show_progress	Type: bool Required: No Description: Whether to show a progress bar during encoding (for large batches). Example: True

Parameters used in `get_model_metadata`

This function loads the encoding model’s metadata without having to load the model itself.

model_id	Type: str Required: Yes Description: Unique identifier of the model to load. Valid Values: fmri-nsd_fsaverage-huze Example: “fmri-nsd_fsaverage-huze”
subject	Type: int Required: Yes Description: Subject ID from the NSD dataset (1-8). Valid Values: 1, 2, 3, 4, 5, 6, 7, 8 Example: 1

Performance

Accuracy Plots (AWS directory):

brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-nsd_fsaverage/model-huze/encoding_models_accuracy

Example Usage

from berg import BERG

# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")

# Load the model
model = berg.get_encoding_model(
    "fmri-nsd_fsaverage-huze",
    subject=1,
)

# Prepare the stimulus images
# Image shape should be [batch_size, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 3, 256, 256))

# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
    model,
    stimulus,
    show_progress=True
)

# The in silico fMRI responses will be a tuple of numpy.ndarray of shape:
# ([batch_size, lh_vertices], [batch_size, rh_vertices])
# where:
# - lh_vertices is the number of selected left hemisphere (LH) vertices for which the in silico
#   fMRI responses are generated.
# - rh_vertices is the number of selected right hemisphere (RH) vertices for which the in silico
#   fMRI responses are generated.

# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
    model,
    stimulus,
    return_metadata=True
)

# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
    "fmri-nsd_fsaverage-huze",
    subject=1
)

References

Model video: https://youtu.be/Qh49zQQCW1g
Model slides: https://penno365-my.sharepoint.com/:p:/g/personal/huze_upenn_edu/EVDLndCXy21LpKEelu_MVkMBK9dbFIhlI6VEQzOl4j6eLA?e=eED63x
Model building code: https://huggingface.co/huzey/nsd_model/tree/main
NSD paper (Allen et al., 2022): https://doi.org/10.1038/s41593-021-00962-x
NSD-synthetic paper (Gifford et al., 2025): https://doi.org/10.48550/arXiv.2503.06286
COCO dataset (Lin et al., 2014): https://cocodataset.org/#home
DINOv2: https://huggingface.co/docs/transformers/en/model_doc/dinov2
LoRA: https://github.com/microsoft/LoRA