fmri-things_fmri_1-vit_b_32

Model Summary

Modality

fMRI

Training Dataset

THINGS fMRI1

Species

Human

Stimuli

Images

Model Type

Vision transformer (ViT-B/32)

Creator

Domenic Bersch

Description

This encoding model consists of a linear mapping through linear regression of a vision transformer (Dosovitskiy et al., 2020) image features onto whole-brain functional magnetic resonance imaging (fMRI) responses from the THINGS-fMRI dataset (Hebart et al., eLife 2023). The model provides features from all 12 transformer layers, using the full set of patch tokens per layer to represent each stimulus image. For each image stimulus, features are concatenated across all spatial tokens and reduced to 250 principal components via principal-component analysis (PCA). These reduced features serve as predictors for fMRI responses.

Neural data. Encoding models were trained on the preprocessed data preparation provided in THINGS fMRI1. fMRI data were recorded from three human participants (sub-01–sub-03) viewing 1,854 object categories from the THINGS database (~8,740 naturalistic object images). Recordings were acquired at 1.6 mm isotropic resolution, preprocessed with standard fMRI pipelines including motion correction, slice-timing correction, and spatial normalization.

Model training partition. Single-trial responses to approximately 8,640 unique naturalistic images were used for training.

Model testing partition. 100 test images, each repeated 12 times, were used for evaluation; the target responses correspond to the average fMRI activity across repetitions.

Training procedure. The model was trained in 32 chunks (~6,604 voxels each) for memory efficiency. Independent linear regression models were fitted for each voxel, predicting voxel responses from the PCA-reduced feature vectors. The resulting model weights provide a voxel-wise mapping from visual features to fMRI activity.

Noise ceiling. The noise ceiling was computed from split-half reliability of voxel responses across the 12 repeated presentations of each test image. Two metrics are provided: (1) single-trial noise ceiling based on individual trial reliability, and (2) test-set noise ceiling based on averaged test responses. These represent the theoretical upper bound of prediction accuracy for each voxel.

Output. Each trained model predicts whole-brain fMRI responses for all 211,339 voxels (or user-specified subsets via ROI selection) for each input image.

Metadata

fmri

voxel_coords : (211339, 3) - Voxel coordinates in volume space (x, y, z indices)

n_voxels : int - Total number of voxels (211339)

subject_id : int - Subject identifier (e.g., ‘1’)

encoding_model

train_stimuli : (8640,) - Stimulus filenames for training trials

train_concepts : (8640,) - Concept labels for training trials

test_stimuli : (1200,) - Stimulus filenames for test trials

test_concepts : (1200,) - Concept labels for test trials

noise_ceiling_singletrial : (211339,) - Max explainable variance per voxel based on single-trial repeat reliability

noise_ceiling_testset : (211339,) - Max explainable variance per voxel based on averaged test-set repeats

splithalf_corrected : (211339,) - Raw split-half voxel reliability without correction

splithalf_uncorrected : (211339,) - Split-half reliability corrected to estimate full-data consistency

correlation_results : (211339,) - Encoding model prediction accuracy (Pearson’s r) for each voxel (computed on the test data)

prf

prf_eccentricity : (211339,) - Distance of receptive field center from fixation (deg)

prf_polarangle : (211339,) - Angular position of receptive field center (0–360°)

prf_rsquared : (211339,) - Variance explained by pRF model (fit quality)

prf_size : (211339,) - Estimated receptive field size (deg)

roi

V1, V2, V3, hV4, VO1, VO2, LO1_prf, LO2_prf, TO1, TO2, V3b, V3a, lFFA, rFFA, lOFA, rOFA, lEBA, rEBA, lPPA, rPPA, lRSC, rRSC, lTOS, rTOS, lLOC, rLOC, IT, lSTS, rSTS : variable length - Each ROI entry contains voxel indices (variable length) for that functional region

Input

Type

numpy.ndarray

Shape

['batch_size', 3, 'height', 'width']

Description

The input should be a batch of RGB images.

Constraints

  • Image values should be integers in range [0, 255].

  • Image dimensions (height, width) should be equal (square).

  • Minimum recommended image size: 224×224 pixels.

Output

Type

numpy.ndarray

Shape

['batch_size', 'n_voxels']

Description

The output is a 2D array containing in silico fMRI responses.

Dimensions

batch_size: Number of stimuli in the batch.
n_voxels: Number of voxels (up to 211,339, based on ROI selection).

Parameters

Parameters used in get_encoding_model

This function loads the encoding model.

model_id

Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-things_fmri_1-vit_b_32
Example: “fmri-things_fmri_1-vit_b_32”

subject

Type: int
Required: Yes
Description: Subject ID from the THINGS fMRI dataset.
Valid Values: 1, 2, 3
Example: 1

selection

Type: dict
Required: No
Description: Specifies which outputs to include in the model responses.
Can include specific ROIs and/or voxel indices. If not provided,
fMRI responses are generated for all voxels.

Properties:

roi
Type: list[str]
Description: List of region-of-interest (ROI) labels to include. Each ROI
represents a functionally defined brain region:
• Early visual: V1, V2, V3, hV4, V3a, V3b
• Ventral stream: VO1, VO2, LO1_prf, LO2_prf, TO1, TO2
• High-level visual: IT (inferior temporal cortex)
• Category-selective: lFFA/rFFA (faces), lOFA/rOFA (faces),
lEBA/rEBA (bodies), lPPA/rPPA (places), lRSC/rRSC (scenes),
lTOS/rTOS (tools), lLOC/rLOC (objects)
• Temporal: lSTS/rSTS (superior temporal sulcus)
If multiple ROIs are listed, their voxels are concatenated.
Valid values: “V1”, “V2”, “V3”, “hV4”, “VO1”, “VO2”, “LO1_prf”, “LO2_prf”, “TO1”, “TO2”, “V3b”, “V3a”, “lFFA”, “rFFA”, “lOFA”, “rOFA”, “lEBA”, “rEBA”, “lPPA”, “rPPA”, “lRSC”, “rRSC”, “lTOS”, “rTOS”, “lLOC”, “rLOC”, “IT”, “lSTS”, “rSTS”
Example: [‘V1’, ‘V2’, ‘IT’]

voxel_index
Type: numpy.ndarray
Description: Binary one-hot encoded vector indicating which voxels to include.
Must have exactly the same length as the number of available voxels (211,339).
Each position set to 1 indicates that voxel should be included.
Example: [0, 0, ‘…’, 1, 1, 0]

device

Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”

Parameters used in encode

This function generates in silico neural responses using the encoding model previously loaded.

model

Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.

stimulus

Type: numpy.ndarray
Required: Yes
Description: A batch of RGB images to be encoded. Images should be in integer format with values in the range [0, 255], and square dimensions (e.g. 224×224).
Example: “An array of shape [100, 3, 224, 224] representing 100 RGB images.”

return_metadata

Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural resposnes.
Example: True

show_progress

Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True

Parameters used in get_model_metadata

This function loads the encoding model’s metadata without having to load the model itself.

model_id

Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: fmri-things_fmri_1-vit_b_32
Example: “fmri-things_fmri_1-vit_b_32”

subject

Type: int
Required: Yes
Description: Subject ID from the THINGS fMRI dataset.
Valid Values: 1, 2, 3
Example: 1

Performance

Accuracy Plots (AWS directory):

  • brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-things_fmri_1/model-vit_b_32/encoding_models_accuracy

Example Usage

from berg import BERG

# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")

# Load the model
model = berg.get_encoding_model(
    "fmri-things_fmri_1-vit_b_32",
    subject=1,
    selection={
        "roi": ["V1", "V2", "IT"],
        "voxel_index": [0, 0, '...', 1, 1, 0]
    }
)

# Prepare the stimulus images
# Image shape should be [batch_size, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 3, 256, 256))

# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
    model,
    stimulus,
    show_progress=True
)

# The in silico fMRI responses will be a numpy.ndarray of shape:
# ['batch_size', 'n_voxels']
# where:
# - n_voxels: Number of voxels (up to 211,339, based on ROI selection).

# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
    model,
    stimulus,
    return_metadata=True
)

# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
    "fmri-things_fmri_1-vit_b_32",
    subject=1
)

References