DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion

Mar 1, 2026·

Junxiang Liu

Junming Lin

Jie Zhou

Wei Xiong

Jiangtong Li#

Jie Li#

Jie Zhuang

Hongfei Ji

· 0 min read

PDF

Abstract

Reconstructing dynamic visual scenes from electroencephalography (EEG) signals presents a significant challenge. Existing methods often yield temporally disjointed and inaccurate visual semantic reconstructions, struggling with poor dynamic timing alignment and lacking the integration of cognitive priors. In neuroscience, the dual-stream theory describes the physiological basis for the generation and transmission of visual neural signals, offering a valuable prior to guide the reconstruction process. To address these challenges, we follow the guidance of dual-stream theory and introduce DynaMind, a model that reconstructs video by jointly modeling neural dynamics and semantic features using three core modules: a Regional-aware Semantic Mapper (RSM), a Temporal-aware Dynamic Aligner (TDA), and a Dual-Guidance Video Reconstructor (DGVR). The RSM models neural pathways to capture detailed semantic information, using regional-aware encoders interconnected via channel-wise multiplicative gating. Meanwhile, the TDA enforces temporal dynamic consistency between EEG and video, and the DGVR generates videos with superior fidelity, temporal coherence, and semantic accuracy compared to prior EEG2Video approaches.

Type

Conference paper

Publication

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)

Last updated on May 21, 2026

EEG Decoding Brain-Computer Interface Diffusion Model

← CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model Apr 7, 2026

Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering Jan 1, 2026 →