Links
Abstract
Recent advances in fMRI-based visual decoding have enabled compelling reconstructions of perceived images. However, most approaches rely on subject-specific training, limiting scalability and practical deployment. VoxelFormer is a lightweight transformer architecture that enables multi-subject training for visual decoding from fMRI. VoxelFormer integrates a Token Merging Transformer (ToMer) for efficient voxel compression and a query-driven Q-Former that produces fixed-size neural representations aligned with the CLIP image embedding space.
My role in this project
I proposed the question formulation, architecture and guided the students through the project.
Citation
@ARTICLE{Le2025-vd,
title = "{VoxelFormer}: Parameter-efficient multi-subject visual
decoding from {fMRI}",
author = "Le, Chenqian and Zhao, Yilin and Emami, Nikasadat and Yadav,
Kushagra and Liu, Xujin ``chris and Chen, Xupeng and Wang,
Yao",
journal = "arXiv [cs.CV]",
month = sep,
year = 2025,
archivePrefix = "arXiv",
primaryClass = "cs.CV",
eprint = "2509.09015"
}