DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
Mar 1, 2026·,,,,,,,·
0 min read
Junxiang Liu
Junming Lin
Jie Zhou
Wei Xiong
Jiangtong Li#
Jie Li#
Jie Zhuang
Hongfei Ji
Abstract
Reconstructing dynamic visual scenes from electroencephalography (EEG) signals presents a significant challenge. Existing methods often yield temporally disjointed and inaccurate visual semantic reconstructions, struggling with poor dynamic timing alignment and lacking the integration of cognitive priors. In neuroscience, the dual-stream theory describes the physiological basis for the generation and transmission of visual neural signals, offering a valuable prior to guide the reconstruction process. To address these challenges, we follow the guidance of dual-stream theory and introduce DynaMind, a model that reconstructs video by jointly modeling neural dynamics and semantic features using three core modules: a Regional-aware Semantic Mapper (RSM), a Temporal-aware Dynamic Aligner (TDA), and a Dual-Guidance Video Reconstructor (DGVR). The RSM models neural pathways to capture detailed semantic information, using regional-aware encoders interconnected via channel-wise multiplicative gating. Meanwhile, the TDA enforces temporal dynamic consistency between EEG and video, and the DGVR generates videos with superior fidelity, temporal coherence, and semantic accuracy compared to prior EEG2Video approaches.
Type
Publication
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)