Decoding Papers

A selection of brain decoding papers spanning language, images, video, and speech — across fMRI, MEG/EEG, ECoG, and intracortical recordings. Columns capture modality, what was decoded, dataset scale, model, and reported metrics.

Paper	Modality	What decoded	# participants	Data (hours)	Model	Metrics
Semantic reconstruction of continuous language from non-invasive brain recordings	fMRI	continuous language (semantic gist of stories)	3	~16 h / subject	encoding model + LM + beam search	semantic similarity, timepoint accuracy (~65–82%)
Generative language reconstruction from brain recordings	fMRI	continuous language (direct generation)	5 + 8 + 28 (three datasets)	multi-hour per subject	LLaMA-2 conditioned on brain signals	BLEU / ROUGE / semantic similarity (paper-dependent)
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors	fMRI	images (natural scenes)	4 (NSD subjects)	~40 sessions / subject	CLIP + diffusion prior	>93% retrieval top-1, CLIP similarity
High-resolution image reconstruction with latent diffusion models from human brain activity	fMRI	images	4	~30–40 sessions	Stable Diffusion latent regression	~75–83% identification accuracy
Brain decoding: toward real-time reconstruction of visual perception	MEG	images	4 (THINGS-MEG)	~4 sessions / subject	contrastive MEG encoder + latent diffusion	7× retrieval improvement over linear, perceptual similarity
Reconstructing visual experiences from brain activity evoked by natural movies	fMRI	video clips	3	multi-session	motion-energy + Bayesian prior	~95% identification
Deep image reconstruction from human brain activity	fMRI	images + imagery	3	months (~2h/day sessions)	VGG feature inversion	human eval ~80%+
Neuroprosthesis for decoding speech in a paralyzed person	ECoG	speech → text	1	~22 h	neural classifier + LM	~15 wpm, ~25% WER
Generalizable spelling using a speech neuroprosthesis	ECoG	spelling (text)	1	multi-session	deep net + LM	~6% CER
A high-performance speech neuroprosthesis	intracortical	speech → text	1	multi-session	RNN decoder + LM rescoring	9.1% WER (50 vocab), 23.8% (125k)
Decoding speech perception from non-invasive brain recordings	MEG/EEG	perceived speech segments	~175	large aggregated datasets	contrastive + wav2vec	44% top-1 MEG / 19% top-1 EEG (1,594 segments)
Brain-to-Text Decoding: A Non-invasive Approach via Typing	MEG/EEG	typed sentences	35	multi-session	CNN + transformer + character LM	19–32% CER (MEG), 67% CER (EEG)