============
fmri-bmd-s3d
============

Model Summary
------------

.. list-table::
   :widths: 30 70
   :stub-columns: 1

   * - Modality
     - fMRI
   * - Training Dataset
     - BOLD Moments Dataset (BMD) (MNI152 volume space)
   * - Species
     - Human
   * - Stimuli
     - 3 second videos
   * - Model Type
     - 3D CNN model (s3d)
   * - Creator
     - Alessandro Gifford

Description
----------

These encoding models consist in a linear mapping (through linear regression) of video CNN
(Xie et al., 2017) image features onto fMRI responses. Prior to mapping onto fMRI responses, the
video features have been downsampled to 100 principal components using principal component analysis.

The encoding models were trained on the BOLD Moments Dataset (BMD) (Lahner et al., 2024), fMRI responses of 10
subjects to 1102 3-second naturalistic movies coming from the Memento10k dataset dataset (Newman et al., 2020).
One encoding model was trained for each BMD subject, and for each fMRI vertex.

**Preprocessing.** The encoding models are trained on BMD's data prepared in MNI152 volume space, from the
“versionB” preprocessing version. Note that the BMD data were *z*-scored at each scan session, and as
a consequence the in silico fMRI responses generated by the encoding models also live in *z*-scored space.

**Model training partition.** fMRI responses for 1000 videos.

**Model testing partition.** fMRI responses for 102 videos.

**ROIs.** Each ROI in the metadata consists of a tuple with 3 items: (1) The ROI voxel indices in 3-dimensional brain
volume space; (2) The ROI voxel indices in 1-dimensional flattened brain volume space; (3) the NIfTI images of the
ROI voxel indices.

Metadata
--------

**fmri**

    **group_mask** : ``(62, 77, 61)`` - Whole brain group mask (voxels defined for all subjects)

    **group_mask_header** : ``object`` - Whole brain group mask header

    **group_mask_affine** : ``(4, 4)`` - Whole brain group mask affine

    **sub_mask** : ``(62, 77, 61)`` - Whole brain subject mask

    **sub_mask_header** : ``object`` - Whole brain subject mask header

    **sub_mask_affine** : ``(4, 4)`` - Whole brain subject mask affine

    **rois** : ``dict`` - ROI voxel indices (l = left hemisphere, r = right hemisphere)
        **lV1v** : ``tuple`` - Visual area 1 ventral (LH)

        **rV1v** : ``tuple`` - Visual area 1 ventral (RH)

        **lV1d** : ``tuple`` - Visual area 1 dorsal (LH)

        **rV1d** : ``tuple`` - Visual area 1 dorsal (RH)

        **lV2v** : ``tuple`` - Visual area 2 ventral (LH)

        **rV2v** : ``tuple`` - Visual area 2 ventral (RH)

        **lV2d** : ``tuple`` - Visual area 2 dorsal (LH)

        **rV2d** : ``tuple`` - Visual area 2 dorsal (RH)

        **lV3v** : ``tuple`` - Visual area 3 ventral (LH)

        **rV3v** : ``tuple`` - Visual area 3 ventral (RH)

        **lV3d** : ``tuple`` - Visual area 3 dorsal (LH)

        **rV3d** : ``tuple`` - Visual area 3 dorsal (RH)

        **lV3ab** : ``tuple`` - Visual areas 3a and 3b (LH)

        **rV3ab** : ``tuple`` - Visual areas 3a and 3b (RH)

        **lhV4** : ``tuple`` - Human V4 complex (LH)

        **rhV4** : ``tuple`` - Human V4 complex (RH)

        **lFFA** : ``tuple`` - Fusiform face area (LH)

        **rFFA** : ``tuple`` - Fusiform face area (RH)

        **lOFA** : ``tuple`` - Occipital face area (LH)

        **rOFA** : ``tuple`` - Occipital face area (RH)

        **lEBA** : ``tuple`` - Extrastriate body area (LH)

        **rEBA** : ``tuple`` - Extrastriate body area (RH)

        **lLOC** : ``tuple`` - Lateral occipital complex (LH)

        **rLOC** : ``tuple`` - Lateral occipital complex (RH)

        **lPPA** : ``tuple`` - Parahippocampal place area (LH)

        **rPPA** : ``tuple`` - Parahippocampal place area (RH)

        **lRSC** : ``tuple`` - Retrosplenial cortex (LH)

        **rRSC** : ``tuple`` - Retrosplenial cortex (RH)

        **lSTS** : ``tuple`` - Superior temporal sulcus (LH)

        **rSTS** : ``tuple`` - Superior temporal sulcus (RH)

        **lTOS** : ``tuple`` - Temporal occipital sulcus (LH)

        **rTOS** : ``tuple`` - Temporal occipital sulcus (RH)

        **lMT** : ``tuple`` - Middle temporal area (LH)

        **rMT** : ``tuple`` - Middle temporal area (RH)

        **l7AL** : ``tuple`` - Dorsal intraparietal area 7AL (LH)

        **r7AL** : ``tuple`` - Dorsal intraparietal area 7AL (RH)

        **lIPS0** : ``tuple`` - Intraparietal area IPS0 (LH)

        **rIPS0** : ``tuple`` - Intraparietal area IPS0 (RH)

        **lIPS1-2-3** : ``tuple`` - Intraparietal areas IPS1, IPS2, and IPS3 (LH)

        **rIPS1-2-3** : ``tuple`` - Intraparietal areas IPS1, IPS2, and IPS3 (RH)

        **lPFt** : ``tuple`` - Inferior parietal area PFt (LH)

        **rPFt** : ``tuple`` - Inferior parietal area PFt (RH)

        **lBA2** : ``tuple`` - Brodmann area 2 (LH)

        **rBA2** : ``tuple`` - Brodmann area 2 (RH)

        **lPFop** : ``tuple`` - Inferior parietal area PFop (LH)

        **rPFop** : ``tuple`` - Inferior parietal area PFop (RH)

        **BMDgeneral** : ``tuple`` - BMD general visual cortex mask
**encoding_models**

    **noiseceiling_task_train_n_1** : ``(N,)`` - Voxelwise noise ceiling computed on single train data repeats

    **noiseceiling_task_train_n_3** : ``(N,)`` - Voxelwise noise ceiling computed on all train data repeats

    **noiseceiling_task_test_n_1** : ``(N,)`` - Voxelwise noise ceiling computed on single test data repeats

    **noiseceiling_task_test_n_10** : ``(N,)`` - Voxelwise noise ceiling computed on all test data repeats

    **correlation** : ``(N,)`` - Correlation prediction accuracy

    **r2** : ``(N,)`` - Explained variance (R² prediction accuracy)

    **explained_variance** : ``(N,)`` - Noise-ceiling-normalized explained variance

Input
-----

.. list-table::
   :widths: 20 80
   :stub-columns: 1

   * - Type
     - ``numpy.ndarray``
   * - Shape
     - ``['batch_size', 'video_frames', '3_channels', 'height', 'width']``
   * - Description
     - The input should be a batch of RGB video frames. While the model takes an input videos of any duration, we recommend using ~3-second videos to match the duration of the videos used to train the encoding models. The videos should have at least 14 frames.
   * - Constraints
     - * Image values should be integers in range [0, 255].
       * Image dimensions (height, width) should be equal (square).
       * Minimum recommended video size: 256×256 pixels.

Output
------

.. list-table::
   :widths: 20 80
   :stub-columns: 1

   * - Type
     - ``numpy.ndarray``
   * - Shape
     - ``['batch_size', 'n_voxels']``
   * - Description
     - The output is a 2D array containing in silico fMRI responses.

**Dimensions:**

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Name
     - Description
   * - batch_size
     - Number of stimulus videos in the batch.
   * - n_voxels
     - Number of selected voxels for which the in silico fMRI responses are generated.

Parameters
---------

Parameters used in ``get_encoding_model``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This function loads the encoding model.

.. list-table::
   :widths: 20 80
   :header-rows: 0

   * - **model_id**
     - | **Type:** str
       | **Required:** Yes
       | **Description:** Unique identifier of the model to load.
       | **Valid Values:** fmri-bmd-s3d
       | **Example:** "fmri-bmd-s3d"
   * - **subject**
     - | **Type:** int
       | **Required:** Yes
       | **Description:** Subject ID from the BMD dataset (1-10).
       | **Valid Values:** 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
       | **Example:** 1
   * - **selection**
     - | **Type:** dict
       | **Required:** No
       | **Description:** Specifies which outputs to include in the model responses. If not provided, fMRI responses are generate for whole brain voxels.
       | 
       | **Properties:**
       | 
       | **roi**
       |     **Type:** str
       |     **Description:** The region-of-interest (ROI) for which the in silico fMRI responses are generated.
       |     **Valid values:** "lV1v", "rV1v", "lV1d", "rV1d", "lV2v", "rV2v", "lV2d", "rV2d", "lV3v", "rV3v", "lV3d", "rV3d", "lV3ab", "rV3ab", "lhV4", "rhV4", "lFFA", "rFFA", "lOFA", "rOFA", "lEBA", "rEBA", "lLOC", "rLOC", "lPPA", "rPPA", "lRSC", "rRSC", "rSTS", "lSTS", "lTOS", "rTOS", "lMT", "rMT", "l7AL", "r7AL", "lIPS0", "rIPS0", "rIPS1-2-3", "lIPS1-2-3", "lPFt", "rPFt", "lBA2", "rBA2", "lPFop", "rPFop", "BMDgeneral"
       | 
       | **voxels**
       |     **Type:** numpy.ndarray
       |     **Description:** Binary one-hot encoded vector with ones indicating the voxels for which the in
       |     silico fMRI responses are generated. This vector must have exactly the same
       |     length as the number of voxels, which varies for each subject:
       |     - Subject 1:  108,219 voxels
       |     - Subject 2:  108,603 voxels
       |     - Subject 3:  108,366 voxels
       |     - Subject 4:  108,283 voxels
       |     - Subject 5:  108,201 voxels
       |     - Subject 6:  108,449 voxels
       |     - Subject 7:  108,126 voxels
       |     - Subject 8:  108,407 voxels
       |     - Subject 9:  108,250 voxels
       |     - Subject 10: 107,987 voxels
       |     The voxels from the one-hot encoded vector are only selected if the "roi" key
       |     is not provided, or has value None.
   * - **device**
     - | **Type:** str
       | **Required:** No
       | **Description:** Device to run the model on. 'auto' will use CUDA if available, otherwise CPU.
       | **Valid Values:** "cpu", "cuda", "auto"
       | **Example:** "auto"

Parameters used in ``encode``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This function generates in silico neural responses using the encoding model previously loaded.

.. list-table::
   :widths: 20 80
   :header-rows: 0

   * - **model**
     - | **Type:** BaseModelInterface
       | **Required:** Yes
       | **Description:** An instantiated and loaded encoding model.
   * - **stimulus**
     - | **Type:** numpy.ndarray
       | **Required:** Yes
       | **Description:** A batch of RGB videos to be encoded. Videos should be in integer format with values in the range [0, 255], of square dimensions (e.g. 256×256), and should have at least 14 frames.
       | **Example:** "An array of shape [100, 90, 3, 256, 256] representing 100 RGB videos, with 90 frames each."
   * - **return_metadata**
     - | **Type:** bool
       | **Required:** No
       | **Description:** Whether to return the encoding model's metadata together with the in silico neural resposnes.
       | **Example:** True
   * - **show_progress**
     - | **Type:** bool
       | **Required:** No
       | **Description:** Whether to show a progress bar during encoding (for large batches).
       | **Example:** True

Parameters used in ``get_model_metadata``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This function loads the encoding model's metadata without having to load the model itself.

.. list-table::
   :widths: 20 80
   :header-rows: 0

   * - **model_id**
     - | **Type:** str
       | **Required:** Yes
       | **Description:** Unique identifier of the model to load.
       | **Valid Values:** fmri-bmd-s3d
       | **Example:** "fmri-bmd-s3d"
   * - **subject**
     - | **Type:** int
       | **Required:** Yes
       | **Description:** Subject ID from the BMD dataset (1-10).
       | **Valid Values:** 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
       | **Example:** 1

Performance
----------

**Accuracy Plots (AWS directory):**

* ``brain-encoding-response-generator/encoding_models/modality-fmri/train_dataset-bmd/model-s3d/encoding_models_accuracy``

Example Usage
------------


.. code-block:: python

    from berg import BERG
    
    # Initialize BERG
    berg = BERG(berg_dir="path/to/brain-encoding-response-generator")
    
    # Load the model
    model = berg.get_encoding_model(
        "fmri-bmd-s3d",
        subject=1,
    )
    
    # Prepare the stimulus videos
    # Video shape should be [batch_size, video_frames, 3 RGB channels, height, width]
    stimulus = np.random.randint(0, 255, (100, 90, 3, 256, 256))
    
    # Generates the in silico neural responses using the encoding model previously loaded
    responses = berg.encode(
        model,
        stimulus,
        show_progress=True
    )
    
    # The in silico fMRI responses will be a numpy.ndarray of shape:
    # ['batch_size', 'n_voxels']
    # where:
    # - n_voxels: Number of selected voxels for which the in silico fMRI responses are generated.
    
    # Generate in silico neural responses with metadata
    responses, metadata = berg.encode(
        model,
        stimulus,
        return_metadata=True
    )
    
    # Load the encoding model's metadata without having to load the model itself
    metadata = berg.get_model_metadata(
        "fmri-bmd-s3d",
        subject=1
    )
    

References
---------

* Model building code: https://github.com/gifale95/BERG/tree/main/berg_creation_code
* BMD paper (Lahner et al., 2024): https://doi.org/10.1038/s41593-021-00962-x
* Memento 10k dataset (Newman et al., 2020): https://link.springer.com/chapter/10.1007/978-3-030-58517-4_14
* s3d (Xie et al., 2017): https://doi.org/10.48550/arXiv.1712.04851