DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion

Mar 1, 2026·
Junxiang Liu
,
Junming Lin
,
Jie Zhou
,
Wei Xiong
,
Jiangtong Li#
,
Jie Li#
,
Jie Zhuang
,
Hongfei Ji
· 0 min read
PDF
Abstract
Reconstructing dynamic visual scenes from electroencephalography (EEG) signals presents a significant challenge. Existing methods often yield temporally disjointed and inaccurate visual semantic reconstructions, struggling with poor dynamic timing alignment and lacking the integration of cognitive priors. In neuroscience, the dual-stream theory describes the physiological basis for the generation and transmission of visual neural signals, offering a valuable prior to guide the reconstruction process. To address these challenges, we follow the guidance of dual-stream theory and introduce DynaMind, a model that reconstructs video by jointly modeling neural dynamics and semantic features using three core modules: a Regional-aware Semantic Mapper (RSM), a Temporal-aware Dynamic Aligner (TDA), and a Dual-Guidance Video Reconstructor (DGVR). The RSM models neural pathways to capture detailed semantic information, using regional-aware encoders interconnected via channel-wise multiplicative gating. Meanwhile, the TDA enforces temporal dynamic consistency between EEG and video, and the DGVR generates videos with superior fidelity, temporal coherence, and semantic accuracy compared to prior EEG2Video approaches.
Type
Publication
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)