Decoding Papers
A selection of brain decoding papers spanning language, images, video, and speech — across fMRI, MEG/EEG, ECoG, and intracortical recordings. Columns capture modality, what was decoded, dataset scale, model, and reported metrics.
| Paper | Modality | What decoded | # participants | Data (hours) | Model | Metrics |
|---|---|---|---|---|---|---|
| Semantic reconstruction of continuous language from non-invasive brain recordings | fMRI | continuous language (semantic gist of stories) | 3 | ~16 h / subject | encoding model + LM + beam search | semantic similarity, timepoint accuracy (~65–82%) |
| Generative language reconstruction from brain recordings | fMRI | continuous language (direct generation) | 5 + 8 + 28 (three datasets) | multi-hour per subject | LLaMA-2 conditioned on brain signals | BLEU / ROUGE / semantic similarity (paper-dependent) |
| Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors | fMRI | images (natural scenes) | 4 (NSD subjects) | ~40 sessions / subject | CLIP + diffusion prior | >93% retrieval top-1, CLIP similarity |
| High-resolution image reconstruction with latent diffusion models from human brain activity | fMRI | images | 4 | ~30–40 sessions | Stable Diffusion latent regression | ~75–83% identification accuracy |
| Brain decoding: toward real-time reconstruction of visual perception | MEG | images | 4 (THINGS-MEG) | ~4 sessions / subject | contrastive MEG encoder + latent diffusion | 7× retrieval improvement over linear, perceptual similarity |
| Reconstructing visual experiences from brain activity evoked by natural movies | fMRI | video clips | 3 | multi-session | motion-energy + Bayesian prior | ~95% identification |
| Deep image reconstruction from human brain activity | fMRI | images + imagery | 3 | months (~2h/day sessions) | VGG feature inversion | human eval ~80%+ |
| Neuroprosthesis for decoding speech in a paralyzed person | ECoG | speech → text | 1 | ~22 h | neural classifier + LM | ~15 wpm, ~25% WER |
| Generalizable spelling using a speech neuroprosthesis | ECoG | spelling (text) | 1 | multi-session | deep net + LM | ~6% CER |
| A high-performance speech neuroprosthesis | intracortical | speech → text | 1 | multi-session | RNN decoder + LM rescoring | 9.1% WER (50 vocab), 23.8% (125k) |
| Decoding speech perception from non-invasive brain recordings | MEG/EEG | perceived speech segments | ~175 | large aggregated datasets | contrastive + wav2vec | 44% top-1 MEG / 19% top-1 EEG (1,594 segments) |
| Brain-to-Text Decoding: A Non-invasive Approach via Typing | MEG/EEG | typed sentences | 35 | multi-session | CNN + transformer + character LM | 19–32% CER (MEG), 67% CER (EEG) |