Jiangtong Li

Jiangtong Li (李江彤)

Postdoctoral Associate

About Me

Jiangtong Li (李江彤) is currently a postdoctoral associate, supervised by Prof. Changjun Jiang at School of Computer Science and Technology of Tongji University (同济大学), Shanghai, China. His research interests include multi-modal modeling, large language model, graph learning, and big data in financial.

Prior to that, he obtained his Ph.D degree in computer science from Shanghai Jiao Tong University, supervised by Prof. Liqing Zhang and Prof. Li Niu, and bachelor in chemistry from Shanghai Jiao Tong University in China. He has published more than 30 papers at top international AI conferences such as ACL, ICCV, CVPR, ICML, AAAI.

Interests

Multimodal (Large) Model
Big Data in Finance
Causal Inference
Brain-Computer Interface
Cognitive Science

Experience

Postdoctoral Associate
Tongji University · 2024–Present
PhD in Computer Science
Shanghai Jiao Tong University · 2019–2024
BSc in Chemistry
Shanghai Jiao Tong University · 2015–2019

News

2026.05Six papers (five as Corresponding) on Multimodal Reasoning(x3), EEG Foundation Model(x2), and Graph Attack(x1) were accepted by ACL, ICML, IJCAI 2026, and KBS.

2026.04One paper (First Author) on LMM in Finance was accepted by BDMA.

2026.03One paper (Corresponding) on EEG Decoding was accepted by CVPR 2026.

2026.01Two papers (one as Corresponding) on Multimodal Reasoning(x1) and Anti-Fraud(x1) were accepted by WWW 2026.

2025.12One paper (First Author) on Multimodal Reasoning was accepted by IEEE TPAMI.

2025.11Two papers on LLM Efficiency(x1) and Anti-Fraud(x1) were accepted by HPCA and AAAI 2026.

2025.09One paper (First Author) on Graph Attack was accepted by NeurIPS 2025.

2025.07I was awarded World Artificial Intelligence Conference Youth Outstanding Paper Honorable Mention (世界人工智能大会青年优秀论文提名奖).

2025.06My responsible China Postdoctoral Science Foundation was approved (中国博士后基金会面上项目).

2025.05One paper (Corresponding) on LLM4Debate was accepted by ACL 2025.

2025.05One paper (First Author) on LMM Reasoning was accepted by ICML 2025.

2025.03I was awarded Excellent Doctoral Dissertation (Honorable Mention) in Shanghai Jiao Tong University (上海交通大学优秀博士论文提名).

2025.01One paper on Long-tail Classification was accepted by ICLR 2025.

2024.12I was selected in Excellent PostDoc Program in Shanghai (上海市超级博士后).

2024.11Two papers (First Author) on Interbank Risk in Finance(x1) and LLM in Finance(x1) were accepted by IEEE TNNLS and FCS.

2024.08My responsible National Science Foundation of China on Multimodal Reasoning was approved (国家自然科学基金委青年科学基金).

2024.07I was selected in Postdoctoral Fellowship Program of China Postdoctoral Science Foundation (博士后国家资助).

2024.07One paper on Causal Inference was accepted by ECCV 2024.

2024.05One paper (First Author) on Time Series was accepted by ICML 2024.

2024.04I was awarded Outstanding PhD Graduate in Shanghai (上海市优秀博士毕业生).

2024.03I graduated from Shanghai Jiao Tong University as PhD.

2024.02One paper (First Author) on LMM Reasoning was accepted by CVPR 2024.

2024.01I joined Tongji University as postdoctoral associate.

2023.08I was awarded Yang Yuanqing Excellent PhD Student Scholarship.

2023.08Two papers on Image Harmonization were accepted by ACM MM 2023.

2023.07One paper (First Author) on Multimodal Reasoning and Causal Inference was accepted by ICCV 2023.

2022.10I was awarded National Scholarship for PhD Students (博士生国家奖学金).

2022.08One paper (First Author) on Multimodal Retrieval was accepted by CVIU.

2022.03One paper (First Author) on Multimodal Reasoning was accepted by CVPR 2022.

2021.12One paper (First Author) on Multimodal Retrieval was accepted by AAAI 2022.

2021.10One paper (First Author) on Multimodal Retrieval was accepted by IEEE TIP.

2021.08One paper (First Author) on Video Segmentation was accepted by ACM MM 2021.

2020.12One paper on Multimodal Retrieval was accepted by AAAI 2021.

Selected Publications

* Equal Contribution # Corresponding Author

Multimodal Reasoning

[TPAMI 2026] Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu, Guang Chen, Liqing Zhang, Changjun Jiang. Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2026). PDF Code Data

QPVA³: a planner-executor-reasoner framework that parses a question into a compositional graph, aligns video clips, and aggregates answers — with a new 3,492-tuple VideoQA benchmark and compositional consistency metrics.

[WWW 2026] Qingyang Liu, Jiangtong Li^#, Zelin Peng, Shaobo Wang, Zhaohe Liao, Shuochen Chang, Bingjie Gao, Haonan Zhao, Mu Liu, Jidong Jiang, Li Niu^#. Bridging Visual Dynamics and Reasoning Evaluation: Multimodal Large Language Models for Short Drama Quality Assessment. The ACM Web Conference (WWW 2026). PDF Code

[ACL 2026] Fuyu Wang, Jiangtong Li^#, Kun Zhu, Changjun Jiang. Can LLMs Really Judge? A Progressive Argumentation-Mining Framework for Distinguishing Understanding from Aggregation. Findings of the Association for Computational Linguistics (ACL 2026). PDF

[ICML 2026] Qingyang Liu, Bingjie Gao, Canmiao Fu, Zhipeng Huang, Chen Li, Feng Wang, Shuochen Chang, Shaobo Wang, Yali Wang, Keming Ye, Jiangtong Li^#, Li Niu^#. Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners. International Conference on Machine Learning (ICML 2026). PDF Code

A self-adaptive framework that empowers unified multimodal models to autonomously switch between direct generation, self-reflection, and multi-step planning based on instruction complexity.

[ACL 2025] Fuyu Wang, Jiangtong Li^#, Kun Zhu, Changjun Jiang. InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating. Annual Meeting of the Association for Computational Linguistics (ACL 2025). PDF Code

[ICML 2025] Jiangtong Li^*, Zhaohe Liao^*, Siyu Sun, Qingyang Liu, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Guang Chen, Li Niu, Changjun Jiang, Liqing Zhang. Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering. International Conference on Machine Learning (ICML 2025). PDF

[CVPR 2024] Zhaohe Liao^*, Jiangtong Li^*, Li Niu, Liqing Zhang. Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). PDF

[ICCV 2023] Jiangtong Li, Li Niu, Liqing Zhang. Knowledge Proxy Intervention for Deconfounded Video Question Answering. IEEE/CVF International Conference on Computer Vision (ICCV 2023). PDF

A causal Knowledge Proxy Intervention (KPI) framework that mitigates dataset bias in VideoQA via backdoor adjustment over learnable knowledge proxies, yielding more robust reasoning under spurious correlations.

[CVPR 2022] Jiangtong Li, Li Niu, Liqing Zhang. From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). PDF Code Data

A new VideoQA benchmark with 107K QA pairs spanning description, explanation, prediction and counterfactual questions — pushing video understanding from representation toward evidence and commonsense reasoning.

Multimodal Representation & Retrieval

[AAAI 2022] Jiangtong Li, Li Niu, Liqing Zhang. Action-Aware Embedding Enhancement for Image-Text Retrieval. AAAI Conference on Artificial Intelligence (AAAI 2022). PDF

[CVIU 2022] Jiangtong Li, Zhixin Ling, Li Niu, Liqing Zhang. Zero-Shot Sketch-Based Image Retrieval with Structure-aware Asymmetric Disentanglement. Computer Vision and Image Understanding (CVIU 2022). PDF

[TIP 2021] Jiangtong Li, Liu Liu, Li Niu, Liqing Zhang. Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval. IEEE Transactions on Image Processing (TIP 2021). PDF

A Memory-based EMBedding Enhancement (MEMBER) approach combining the speed of embedding learning with the accuracy of fine-grained pairwise alignment for image-text retrieval.

[ACM MM 2021] Jiangtong Li^*, Wentao Wang^*, Junjie Chen, Li Niu, Jianlou Si, Chen Qian, Liqing Zhang. Video Semantic Segmentation via Sparse Temporal Transformer. ACM International Conference on Multimedia (ACM MM 2021). PDF

[COLING 2018] Zhuosheng Zhang^*, Jiangtong Li^*, Pengfei Zhu, Hai Zhao, Gongshen Liu. Modeling Multi-turn Conversation with Deep Utterance Aggregation. International Conference on Computational Linguistics (COLING 2018). PDF Code Data

Big Data in Finance

[NeurIPS 2025] Jiangtong Li^*, Dongyi Liu^*, Kun Zhu, Dawei Cheng, Changjun Jiang. Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool. Conference on Neural Information Processing Systems (NeurIPS 2025). PDF Code

A multi-category graph backdoor attack with a subgraph triggers pool that injects category-aware structural triggers — exposing GNN vulnerabilities in social, financial and traffic networks under realistic multi-target threats.

[IJCAI 2026] Dongyi Liu, Jiangtong Li^#. Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers. International Joint Conference on Artificial Intelligence (IJCAI 2026). PDF Code

[TNNLS 2024] Jiangtong Li, Ziyuan Zhou, Jingkai Zhang, Dawei Cheng, Changjun Jiang. HFTCRNet: Hierarchical Fusion Transformer for Interbank Credit Rating and Risk Assessment. IEEE Transactions on Neural Networks and Learning Systems (TNNLS 2024). PDF Code

A hierarchical fusion transformer that models interbank contagion chains for credit rating and systemic risk assessment in financial networks.

[FCS 2024] Jiangtong Li, Yang Lei, Yuxuan Bian, Dawei Cheng, Zhijun Ding, Changjun Jiang. RA-CFGPT: Chinese Financial Assistant with Retrieval-Augmented Large Language Model. Frontiers of Computer Science (FCS 2024). PDF

[ICML 2024] Yuxuan Bian^*, Xuan Ju^*, Jiangtong Li^*, Zhijian Xu, Dawei Cheng, Qiang Xu. Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning. International Conference on Machine Learning (ICML 2024). PDF Code

[BDMA 2026] Jiangtong Li^*, Yiyun Zhu^*, Dawei Cheng^#, Zhijun Ding, Changjun Jiang. CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model. Big Data Mining and Analytics (BDMA 2026). PDF Code

A Chinese multimodal financial benchmark with 9,000+ image-question pairs and a staged evaluation system assessing MLLMs on tables, charts, and structural diagrams in financial contexts.

[Preprint] Jiangtong Li, Yuxuan Bian, Guoxuan Wang, Yang Lei, Dawei Cheng, Zhijun Ding, Changjun Jiang. CFGPT: Chinese Financial Assistant with Large Language Model. arXiv preprint. PDF Code

A Chinese Financial GPT framework comprising CFData (584M docs, 141B tokens), CFLLM (InternLM-7B based), and CFAPP deployment framework for real-world financial applications.

[Preprint] Yang Lei^*, Jiangtong Li^*, Dawei Cheng, Zhijun Ding, Changjun Jiang. CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model. arXiv preprint. PDF Code

A benchmark evaluating LLMs on Chinese financial text processing across recognition, classification, and generation tasks with texts ranging from 50 to 1,800+ characters.

[Under Review] Kai Lan^*, Jiayong Zhu^*, Jiangtong Li^#, Dawei Cheng, Guang Chen, Changjun Jiang. FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design. Under Review. PDF

An integrated framework combining an automated scalable pipeline (89K aligned image-question pairs from 23K financial reports) with adversarial reward training to enhance multimodal financial reasoning.

[Under Review] Jiayong Zhu^*, Jiangtong Li^*#, Jinru Ding, Dawei Cheng, Jie Xu, Feng Yu. FinDocMRE: A Benchmark for Document-Level Financial Multimodal Reasoning Evaluation. Under Review. PDF

A multi-image document-level benchmark with 12,207 samples from 2,878 financial reports spanning 12 domains, evaluating LMMs on cross-page visual grounding and numerical reasoning.

Brain-Computer Interface & Cognitive Science

[CVPR 2026] Junxiang Liu, Junming Lin, Jie Zhou, Wei Xiong, Jiangtong Li^#, Jie Li^#, Jie Zhuang, Hongfei Ji. DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026). PDF

Full List of Publications (Google Scholar) Full List of Publications (DBLP)

Projects

Explainable and Verifiable Reasoning in Video QA

Developing compositional reasoning frameworks that decompose complex video questions into interpretable, verifiable inference steps over spatial-temporal structures.

Explainable Video QA via Multi-agent Adversarial Cooperation

Leveraging multi-agent debate and adversarial cooperation to produce interpretable and robust answers for complex video understanding tasks.

Multimodal Joint Modeling for Macro-Meso Financial Risk

Integrating heterogeneous financial signals across macro-economic indicators and meso-level interbank networks for early risk detection and contagion analysis.

Multimodal Joint Modeling for Macro-Meso Financial Risk

Integrating heterogeneous financial signals across macro-economic indicators and meso-level interbank networks for early risk detection and contagion analysis.

Contact

jiangtongli [AT] tongji [DOT] edu [DOT] cn

4800 Cao'an Road, Jiading District, Shanghai, China

School of Computer Science and Technology, Tongji University