fmri-things_fmri_1-vit_b_32
Model Summary
Modality |
fMRI |
|---|---|
Training Dataset |
THINGS fMRI1 |
Species |
Human |
Stimuli |
Images |
Model Type |
Vision transformer (ViT-B/32) |
Creator |
Domenic Bersch |
Description
This encoding model consists of a linear mapping through linear regression of a vision transformer (Dosovitskiy et al., 2020) image features onto whole-brain functional magnetic resonance imaging (fMRI) responses from the THINGS-fMRI dataset (Hebart et al., eLife 2023). The model provides features from all 12 transformer layers, using the full set of patch tokens per layer to represent each stimulus image. For each image stimulus, features are concatenated across all spatial tokens and reduced to 250 principal components via principal-component analysis (PCA). These reduced features serve as predictors for fMRI responses.
Neural data. Encoding models were trained on the preprocessed data preparation provided in THINGS fMRI1. fMRI data were recorded from three human participants (sub-01–sub-03) viewing 1,854 object categories from the THINGS database (~8,740 naturalistic object images). Recordings were acquired at 1.6 mm isotropic resolution, preprocessed with standard fMRI pipelines including motion correction, slice-timing correction, and spatial normalization.
Model training partition. Single-trial responses to approximately 8,640 unique naturalistic images were used for training.
Model testing partition. 100 test images, each repeated 12 times, were used for evaluation; the target responses correspond to the average fMRI activity across repetitions.
Training procedure. The model was trained in 32 chunks (~6,604 voxels each) for memory efficiency. Independent linear regression models were fitted for each voxel, predicting voxel responses from the PCA-reduced feature vectors. The resulting model weights provide a voxel-wise mapping from visual features to fMRI activity.
Noise ceiling. The noise ceiling was computed from split-half reliability of voxel responses across the 12 repeated presentations of each test image. Two metrics are provided: (1) single-trial noise ceiling based on individual trial reliability, and (2) test-set noise ceiling based on averaged test responses. These represent the theoretical upper bound of prediction accuracy for each voxel.
Output. Each trained model predicts whole-brain fMRI responses for all 211,339 voxels (or user-specified subsets via ROI selection) for each input image.
Metadata
fmri
voxel_coords :
(211339, 3)- Voxel coordinates in volume space (x, y, z indices)n_voxels :
int- Total number of voxels (211339)subject_id :
int- Subject identifier (e.g., ‘1’)
encoding_model
train_stimuli :
(8640,)- Stimulus filenames for training trialstrain_concepts :
(8640,)- Concept labels for training trialstest_stimuli :
(1200,)- Stimulus filenames for test trialstest_concepts :
(1200,)- Concept labels for test trialsnoise_ceiling_singletrial :
(211339,)- Max explainable variance per voxel based on single-trial repeat reliabilitynoise_ceiling_testset :
(211339,)- Max explainable variance per voxel based on averaged test-set repeatssplithalf_corrected :
(211339,)- Raw split-half voxel reliability without correctionsplithalf_uncorrected :
(211339,)- Split-half reliability corrected to estimate full-data consistencycorrelation_results :
(211339,)- Encoding model prediction accuracy (Pearson’s r) for each voxel (computed on the test data)
prf
prf_eccentricity :
(211339,)- Distance of receptive field center from fixation (deg)prf_polarangle :
(211339,)- Angular position of receptive field center (0–360°)prf_rsquared :
(211339,)- Variance explained by pRF model (fit quality)prf_size :
(211339,)- Estimated receptive field size (deg)
roi
V1, V2, V3, hV4, VO1, VO2, LO1_prf, LO2_prf, TO1, TO2, V3b, V3a, lFFA, rFFA, lOFA, rOFA, lEBA, rEBA, lPPA, rPPA, lRSC, rRSC, lTOS, rTOS, lLOC, rLOC, IT, lSTS, rSTS :
variable length- Each ROI entry contains voxel indices (variable length) for that functional region
Input
Type |
|
|---|---|
Shape |
|
Description |
The input should be a batch of RGB images. |
Constraints |
|
Output
Type |
|
|---|---|
Shape |
|
Description |
The output is a 2D array containing in silico fMRI responses. |
Dimensions |
batch_size: Number of stimuli in the batch.
n_voxels: Number of voxels (up to 211,339, based on ROI selection).
|
Parameters
Parameters used in get_encoding_model
This function loads the encoding model.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-things_fmri_1-vit_b_32
Example: “fmri-things_fmri_1-vit_b_32”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the THINGS fMRI dataset.
Valid Values: 1, 2, 3
Example: 1
|
selection |
Type: dict
Required: No
Description: Specifies which outputs to include in the model responses.
Can include specific ROIs and/or voxel indices. If not provided,
fMRI responses are generated for all voxels.
Properties:
roi
Type: list[str]
Description: List of region-of-interest (ROI) labels to include. Each ROI
represents a functionally defined brain region:
• Early visual: V1, V2, V3, hV4, V3a, V3b
• Ventral stream: VO1, VO2, LO1_prf, LO2_prf, TO1, TO2
• High-level visual: IT (inferior temporal cortex)
• Category-selective: lFFA/rFFA (faces), lOFA/rOFA (faces),
lEBA/rEBA (bodies), lPPA/rPPA (places), lRSC/rRSC (scenes),
lTOS/rTOS (tools), lLOC/rLOC (objects)
• Temporal: lSTS/rSTS (superior temporal sulcus)
If multiple ROIs are listed, their voxels are concatenated.
Valid values: “V1”, “V2”, “V3”, “hV4”, “VO1”, “VO2”, “LO1_prf”, “LO2_prf”, “TO1”, “TO2”, “V3b”, “V3a”, “lFFA”, “rFFA”, “lOFA”, “rOFA”, “lEBA”, “rEBA”, “lPPA”, “rPPA”, “lRSC”, “rRSC”, “lTOS”, “rTOS”, “lLOC”, “rLOC”, “IT”, “lSTS”, “rSTS”
Example: [‘V1’, ‘V2’, ‘IT’]
voxel_index
Type: numpy.ndarray
Description: Binary one-hot encoded vector indicating which voxels to include.
Must have exactly the same length as the number of available voxels (211,339).
Each position set to 1 indicates that voxel should be included.
Example: [0, 0, ‘…’, 1, 1, 0]
|
device |
Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”
|
Parameters used in encode
This function generates in silico neural responses using the encoding model previously loaded.
model |
Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.
|
stimulus |
Type: numpy.ndarray
Required: Yes
Description: A batch of RGB images to be encoded. Images should be in integer format with values in the range [0, 255], and square dimensions (e.g. 224×224).
Example: “An array of shape [100, 3, 224, 224] representing 100 RGB images.”
|
return_metadata |
Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes.
Example: True
|
show_progress |
Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True
|
Parameters used in get_model_metadata
This function loads the encoding model’s metadata without having to load the model itself.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-things_fmri_1-vit_b_32
Example: “fmri-things_fmri_1-vit_b_32”
|
subject |
Type: int
Required: Yes
Description: Subject ID from the THINGS fMRI dataset.
Valid Values: 1, 2, 3
Example: 1
|
Performance
Accuracy Plots (AWS directory):
brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-things_fmri_1/model-vit_b_32/encoding_models_accuracy
Example Usage
from berg import BERG
# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")
# Load the model
model = berg.get_encoding_model(
"fmri-things_fmri_1-vit_b_32",
subject=1,
selection={
"roi": ["V1", "V2", "IT"],
"voxel_index": [0, 0, '...', 1, 1, 0]
}
)
# Prepare the stimulus images
# Image shape should be [batch_size, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 3, 256, 256))
# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
model,
stimulus,
show_progress=True
)
# The in silico fMRI responses will be a numpy.ndarray of shape:
# ['batch_size', 'n_voxels']
# where:
# - n_voxels: Number of voxels (up to 211,339, based on ROI selection).
# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
model,
stimulus,
return_metadata=True
)
# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
"fmri-things_fmri_1-vit_b_32",
subject=1
)
References
Model building code: https://github.com/gifale95/BERG/tree/main/berg_creation_code/02_train_encoding_models/train_dataset-things_fmri_1/model-vit_b_32
THINGS MEG & fMRI Paper (Hebart et al., 2023): https://doi.org/10.7554/eLife.82580
THINGS MEG & fMRI Data (Hebart et al., 2023): https://plus.figshare.com/collections/_/6161151
THINGS initiative (Hebart et al., 2019): https://things-initiative.org/
ViT-B/32 (Dosovitskiy et al., 2020): https://arxiv.org/abs/2010.11929