utah_array-tvsd-vit_b_32
Model Summary
Modality |
Utah arrays |
|---|---|
Training Dataset |
THINGS Ventral Stream Spiking Dataset (TVSD) |
Species |
Macaque |
Stimuli |
Images |
Model Type |
Vision transformer (ViT-B/32) |
Creator |
Domenic Bersch |
Description
This encoding model consists of a linear mapping through linear regression of a vision transformer (Dosovitskiy et al., 2020) image features onto intracortical spiking activity. The ViT-B/32 model extracts features from all 12 transformer layers, using all 50 patch tokens per layer. Prior to mapping onto neural responses, the image features have been downsampled to 250 principal components using principal component analysis. The encoding models were trained on the THINGS Ventral Stream Spiking Dataset (TVSD) (Papale et al., Neuron 2025), simultaneous intracortical recordings from 1,024 electrodes across macaque ventral stream areas (V1, V4, IT) in response to natural images from the THINGS database (Hebart et al., 2019). The encoding models are trained on either the full training data, or on four independent training data random splits.
Neural data. Encoding models were trained on the preprocessed data preparation provided in the TVSD. Raw broadband signals (30 kHz) were band-pass filtered to extract high-frequency spiking activity, and multi-unit activity (MUA) was obtained using threshold-based spike detection and smoothing, following the official TVSD pipeline. Responses were baseline-corrected and normalized per session, with area-specific time windows aligned to peak latencies (V1: 25–125 ms, V4: 50–150 ms, IT: 75–175 ms). The data were epoched from -100 ms to +199 ms relative to stimulus onset, resulting in 300 time points. More detailed preprocessing steps are described in the TVSD paper.
Model training partition. Single-trial spiking responses to 22,248 unique images from the THINGS database, each presented once during passive fixation, were used for training. One set of encoding models are trained on the full training data. Another set of encoding models are trained on four independent training data random splits (of 5,562 trials each), therefore generating four different in silico spiking response predictions (i.e., repetitions) per image. A unique PCA random seed is derived for each combination of monkey and training split, ensuring independent PCA bases across encoding models.
Model testing partition. Spiking responses to 100 unique images, each repeated 30 times.
Training procedure. Independent encoding models were trained for each monkey (monkeyN and monkeyF).
Noise ceiling. The noise ceiling was computed from the 30 repeated presentations of each test image, following the analytical procedure described in the Natural Scenes Dataset (NSD) paper (Allen et al., 2022).
Output. Each encoding model predicts time-resolved spike responses for all 1024 electrodes (or user-specified subsets) across 300 time points for each input image.
Metadata
utah_array
times :
(300,)- Time points (-100ms to 199ms)electrode_order :
(1024,)- Electrode mapping order (0-based)monkey_id :
str- Monkey identifiern_electrodes :
int- Number of electrodes (1024)
roi
roi_assignments :
(1024,)- ROI assignment per electrode (0=V1, 1=V4, 2=IT)roi_labels :
(3,)- ROI label names [‘V1’, ‘V4’, ‘IT’]
encoding_model
all_training_splits: Training data and encoding accuracy results for encoding models trained on all training splits
train_img_ids :
(22248,)- Training stimulus IDstrain_stimuli :
(22248,)- Training image filenamestrain_concepts :
(22248,)- Training object categoriestrain_days :
(22248,)- Recording days for trainingtrain_sequence_pos :
(22248,)- Position in 4-image sequencecorrelation_results :
(1024, 300)- Prediction accuracy (Pearson’s r)percent_noise_ceiling :
(1024, 300)- Noise ceiling normalized prediction accuracy (% of noise ceiling)single_training_split_{N}: Training data and encoding accuracy results for encoding models trained on training split N (N=1,2,3,4)
train_img_ids :
(5562,)- Training stimulus IDstrain_stimuli :
(5562,)- Training image filenamestrain_concepts :
(5562,)- Training object categoriestrain_days :
(5562,)- Recording days for trainingtrain_sequence_pos :
(5562,)- Position in 4-image sequencecorrelation_results :
(1024, 300)- Prediction accuracy (Pearson’s r)percent_noise_ceiling :
(1024, 300)- Noise ceiling normalized prediction accuracy (% of noise ceiling)test_img_ids :
(3000,)- Test stimulus IDs (individual trials)test_stimuli :
(3000,)- Test image filenames (individual)test_concepts :
(3000,)- Test object categories (individual)test_days :
(3000,)- Recording days for testtest_sequence_pos :
(3000,)- Position in sequence for testSNR :
(4, 1024)- Signal-to-noise ratio per day per electrodeSNR_max :
(1024,)- Best SNR across all days per electrodencsnr :
(1024, 300)- Neural cross-validated signal-to-noise ratio per electrode/timepointnoise_ceiling :
(1024, 300)- Noise ceiling per electrode/timepoint
Input
Type |
|
|---|---|
Shape |
|
Description |
The input should be a batch of RGB images. |
Constraints |
|
Output
Type |
|
|---|---|
Shape |
|
Description |
The output is a 3D or 4D array containing in silico utah-array responses.
The second dimension varies based on train_splits parameter:
- When train_splits=”all”: shape is [batch_size, n_electrodes, n_timepoints]
- When train_splits=”single”: shape is [batch_size, repeats, n_electrodes, n_timepoints]
The n_electrodes dimension corresponds to the number of electrodes in the selected ROI,
which varies by ROI and monkey.
The third dimension corresponds to the timepoints (300).
Monkey N electrode count:
- V1: 448
- V4: 256
- IT: 256
Monkey F electrode count:
- V1: 512
- V4: 192
- IT: 320
|
Dimensions |
batch_size: Number of stimuli in the batch
repeats: Number of simulated repetitions of the same stimulus (always 4; only applies when using the encoding models trained on single training data splits)
n_electrodes: Number of electrodes in the selection
timepoints: Timepoints of recording
|
Parameters
Parameters used in get_encoding_model
This function loads the encoding model.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: utah_array-tvsd-vit_b_32
Example: “utah_array-tvsd-vit_b_32”
|
subject |
Type: str
Required: Yes
Description: Monkey ID
Valid Values: “N”, “F”
Example: “N”
|
train_splits |
Type: str
Required: No
Description: Specifies the training data split on which the encoding model is trained.
- “all”: Use an encoding model trained on all training data splits.
- “single”: Use encoding models trained on four independent training data random splits, therefore generating four different in silico spiking response predictions (i.e., repetitions) per image.
Valid Values: “all”, “single”
Example: “single”
|
selection |
Type: dict
Required: No
Description: Specifies which outputs to include in the model responses.
Can include specific electrodes and/or timepoints. If not provided,
utah-array responses are generated for all electrodes and time points.
Properties:
roi
Type: list[str]
Description: List of ROIs to include in the output
Valid values: “V1”, “V4”, “IT”
Example: [‘V1’, ‘IT’]
electrodes
Type: numpy.ndarray
Description: Binary one-hot encoded vector indicating which electrodes to include.
Must have exactly the same length as the number of available electrode (1024).
Each position set to 1 indicates that timepoint should be included.
Example: [0, 0, ‘…’, 1, 1, 0]
timepoints
Type: numpy.ndarray
Description: Binary one-hot encoded vector indicating which timepoints to include.
Must have exactly the same length as the number of available timepoints (300).
Each position set to 1 indicates that timepoint should be included.
Example: [0, 0, ‘…’, 1, 1, 0]
|
device |
Type: str
Required: No
Description: Device to run the model on. ‘auto’ will use CUDA if available, otherwise CPU.
Valid Values: “cpu”, “cuda”, “auto”
Example: “auto”
|
Parameters used in encode
This function generates in silico neural responses using the encoding model previously loaded.
model |
Type: BaseModelInterface
Required: Yes
Description: An instantiated and loaded encoding model.
|
stimulus |
Type: numpy.ndarray
Required: Yes
Description: A batch of RGB images to be encoded. Images should be in integer format with values in the range [0, 255], and square dimensions (e.g. 224×224).
Example: “An array of shape [100, 3, 224, 224] representing 100 RGB images.”
|
return_metadata |
Type: bool
Required: No
Description: Whether to return the encoding model’s metadata together with the in silico neural responses.
Example: True
|
show_progress |
Type: bool
Required: No
Description: Whether to show a progress bar during encoding (for large batches).
Example: True
|
Parameters used in get_model_metadata
This function loads the encoding model’s metadata without having to load the model itself.
model_id |
Type: str
Required: Yes
Description: Unique identifier of the model to load.
Valid Values: utah_array-tvsd-vit_b_32
Example: “utah_array-tvsd-vit_b_32”
|
subject |
Type: str
Required: Yes
Description: Monkey ID
Valid Values: “N”, “F”
Example: “N”
|
Performance
Accuracy Plots (AWS directory):
brain-encoding-response-generator/encoding_models/modality-utah_array/train_dataset-tvsd/model-vit_b_32/encoding_models_accuracy
Example Usage
from berg import BERG
# Initialize BERG
berg = BERG(berg_dir="path/to/brain-encoding-response-generator")
# Load the model
model = berg.get_encoding_model(
"utah_array-tvsd-vit_b_32",
subject="N",
train_splits="single",
selection={
"roi": ["V1", "IT"],
"electrodes": [0, 0, '...', 1, 1, 0],
"timepoints": [0, 0, '...', 1, 1, 0]
}
)
# Prepare the stimulus images
# Image shape should be [batch_size, 3 RGB channels, height, width]
stimulus = np.random.randint(0, 255, (100, 3, 256, 256))
# Generates the in silico neural responses using the encoding model previously loaded
responses = berg.encode(
model,
stimulus,
show_progress=True
)
# The in silico fMRI responses will be a numpy.ndarray of shape:
# [batch_size, n_electrodes, n_timepoints] or [batch_size, repeats, n_electrodes, n_timepoints]
# where:
# - repeats: Number of simulated repetitions of the same stimulus (always 4; only applies when using the encoding models trained on single training data splits)
# - n_electrodes: Number of electrodes in the selection
# - timepoints: Timepoints of recording
# Generate in silico neural responses with metadata
responses, metadata = berg.encode(
model,
stimulus,
return_metadata=True
)
# Load the encoding model's metadata without having to load the model itself
metadata = berg.get_model_metadata(
"utah_array-tvsd-vit_b_32",
subject="N"
)
References
Model building code: https://github.com/gifale95/BERG/tree/main/berg_creation_code/02_train_encoding_models/train_dataset-tvsd/model-vit_b_32
TVSD Paper (Papale et al., 2025): https://www.sciencedirect.com/science/article/pii/S089662732400881X
TVSD Data (Papale et al., 2025): https://gin.g-node.org/paolo_papale/TVSD
ViT-B/32 (Dosovitskiy et al., 2020): https://arxiv.org/abs/2010.11929