Skip to content

Official implementation of "Decoding Attention from the Visual Cortex: fMRI-Based Prediction of Human Saliency Maps"

License

Notifications You must be signed in to change notification settings

perceivelab/fmri_saliency_decoding

Repository files navigation

Decoding Attention from the Visual Cortex: fMRI-Based Prediction of Human Saliency Maps

Salvatore Calcagno, Marco Finocchiaro, Giovanni Bellitto, Concetto Spampinato, Federica Proietto Salanitri

prl

Overview

Official PyTorch implementation of paper: "Decoding Attention from the Visual Cortex: fMRI-Based Prediction of Human Saliency Maps"

Method

Fig. 1 Overview of the proposed decoding architecture for reconstructing saliency maps from fMRI signals. The model maps voxel-wise brain activity to a low-resolution spatial representation via a linear projection. This is followed by a deep convolutional decoder that progressively upsamples the signal through residual blocks with channel-wise attention (SE modules), producing a high-resolution saliency map aligned with the stimulus image.

Fig. 2 Qualitative decoder predictions across visual ROIs. Each column shows the stimulus image overlaid with the saliency map predicted when restricting fMRI input to a specific ROI. Early visual areas (e.g., V1–V3) yield sharper and more structured predictions that resemble ground-truth saliency, whereas higher-order regions (e.g., FFA, PPA) tend to produce diffuse, centrally biased activations. This illustrates the dominant contribution of early visual cortex to fine-grained saliency decoding.

Paper Abstract

Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing.

Code Usage

Environment

uv venv
uv pip install -r requirements.txt

Training Data

The training data are available in the original GitHub repository by Horikawa and Kamitani:
https://github.com/KamitaniLab/GenericObjectDecoding

From this repository, download both the preprocessed fMRI data and the stimulus images.

Dataset Used in Our Experiment

In our experiment, we used perception data from all five subjects, including both training and test sets:

  • SubjectX_ImageNetTraining
  • SubjectX_ImageNetTest

Directory Organization

The dataset should be organized as follows inside data/GOD_Dataset:

-- fmri_files
    -- training
        Subject1_ImageNetTraining.h5
        Subject2_ImageNetTraining.h5
        ...
    -- test
        Subject1_ImageNetTest.h5
        Subject2_ImageNetTest.h5
        ...
-- images
    -- training
        n01518878_5958.JPEG
        n01518878_12345.JPEG
        ...
    -- test
        n01443537_22563.JPEG
        n01443537_67890.JPEG
        ...

Prepare maps

To train test decoders we need to prepare the ground truth maps (DeepGaze for training and in-house collected for testing). Ensure you replicated the data folder structure (see above) and run:

python scripts/prepare_maps.py --dataset_root data/GOD_Dataset --centerbias data/centerbias_mit1003.npy

Train

chmod +x train.sh
./train.sh

This trains the SaliencyDecoder and saves checkpoints to trained_fmri_decoders/saliency_decoder/S{subject}/{roi}/model_...pth.

Code location: the CLI in scripts/train.py contains the training logic (argument parsing, dataloader prep, checkpointing, etc.).

Inference

chmod +x inference.sh
./inference.sh

Predictions are written to predictions/<decoder>/S<subject>/<roi>/ and metrics CSVs to score/<decoder>/S<subject>/<roi>_avg_outputs.csv, with ROI summaries printed to stdout.

Cite

@article{calcagno_finocchiaro_2026156,
title = {Decoding attention from the visual cortex: fMRI-based prediction of human saliency maps},
journal = {Pattern Recognition Letters},
volume = {199},
pages = {156-162},
year = {2026},
issn = {0167-8655},
doi = {https://doi.org/10.1016/j.patrec.2025.11.019},
url = {https://www.sciencedirect.com/science/article/pii/S0167865525003757},
author = {Salvatore Calcagno and Marco Finocchiaro and Giovanni Bellitto and Concetto Spampinato and Federica {Proietto Salanitri}},
keywords = {BCI, Neural coding, Saliency prediction}
}
Other useful scripts

Compute metrics for center bias

Generate the center bias image executing notebook notebooks/generate_center_bias_gaussian.ipynb.

For metrics computation use:

python scripts/inference.py \
    --fmri_dir "data/GOD_Dataset" \
    --gt_dir "data/GOD_Dataset/images/test_saliency_ours" \
    --precomputed_maps_dir "predictions/center_bias"

About

Official implementation of "Decoding Attention from the Visual Cortex: fMRI-Based Prediction of Human Saliency Maps"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published