Jiangtong Li
  • About
  • News
  • Publications
  • Projects
  • Contact
  • Projects
    • Multimodal Finance
    • Image Harmonization
    • Causal-VidQA
  • Publications
    • Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
    • Can LLMs Really Judge? A Progressive Argumentation-Mining Framework for Distinguishing Understanding from Aggregation
    • Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers
    • Bridging Visual Dynamics and Reasoning Evaluation: Multimodal Large Language Models for Short Drama Quality Assessment
    • CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model
    • DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
    • Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering
    • Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool
    • Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering
    • InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
    • FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design
    • FinDocMRE: A Benchmark for Document-Level Financial Multimodal Reasoning Evaluation
    • HFTCRNet: Hierarchical Fusion Transformer for Interbank Credit Rating and Risk Assessment
    • RA-CFGPT: Chinese Financial Assistant with Retrieval-Augmented Large Language Model
    • Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
    • Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
    • CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
    • Knowledge Proxy Intervention for Deconfounded Video Question Answering
    • CFGPT: Chinese Financial Assistant with Large Language Model
    • From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
    • Zero-Shot Sketch-Based Image Retrieval with Structure-aware Asymmetric Disentanglement
    • Action-Aware Embedding Enhancement for Image-Text Retrieval
    • Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval,
    • Video Semantic Segmentation via Sparse Temporal Transformer,
    • Modeling Multi-turn Conversation with Deep Utterance Aggregation

Jiayong Zhu

FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design

Kai Lan, Jiayong Zhu, Jiangtong Li, Dawei Cheng, Guang Chen, Changjun Jiang

Under Review

Large Language Model Financial Reasoning Multimodal

FinDocMRE: A Benchmark for Document-Level Financial Multimodal Reasoning Evaluation

Jiayong Zhu, Jiangtong Li, Jinru Ding, Dawei Cheng, Jie Xu, Feng Yu

Under Review

Large Language Model Financial Benchmark Multimodal

© 2026 Jiangtong Li

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.

Cite