Jiangtong Li
  • About
  • News
  • Publications
  • Projects
  • Contact
  • Projects
    • Multimodal Finance
    • Image Harmonization
    • Causal-VidQA
  • Publications
    • Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
    • Can LLMs Really Judge? A Progressive Argumentation-Mining Framework for Distinguishing Understanding from Aggregation
    • Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers
    • Bridging Visual Dynamics and Reasoning Evaluation: Multimodal Large Language Models for Short Drama Quality Assessment
    • DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
    • Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering
    • Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool
    • Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering
    • InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
    • HFTCRNet: Hierarchical Fusion Transformer for Interbank Credit Rating and Risk Assessment
    • RA-CFGPT: Chinese Financial Assistant with Retrieval-Augmented Large Language Model
    • Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
    • Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
    • Knowledge Proxy Intervention for Deconfounded Video Question Answering
    • From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
    • Zero-Shot Sketch-Based Image Retrieval with Structure-aware Asymmetric Disentanglement
    • Action-Aware Embedding Enhancement for Image-Text Retrieval
    • Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval,
    • Video Semantic Segmentation via Sparse Temporal Transformer,
    • Modeling Multi-turn Conversation with Deep Utterance Aggregation

Image Processing

Image Harmonization

Computer Vision Image Processing

© 2026 Jiangtong Li

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.

Cite