πŸ‘‹ About Me

I am currently pursuing my Ph.D. at the Institute of Computing Technology , Chinese Academy of Sciences , advised by Prof. Zhaoqi Wang. I am also a Research Intern at LongCat Team, Meituan . Previously, I was a Research Intern at DreamX-World Team, Alibaba . I am deeply grateful for the opportunity to collaborate with exceptional researchers including Prof. Shuo Li, Prof. Yujun Cai, and Prof. Yiwei Wang. Their mentorship and insights have profoundly shaped my academic journey.

My research interests include Vision-Language Model, Agentic Reinforcement Learning, Spatial Intelligence, Foundation Model, AI for Science, Embodied Agents, 3D Vision, and Safety. I have published 20+ papers at top international AI conferences such as NeurIPS, ICLR, ICML, CVPR, ICCV, AAAI, etc.


I am always open β€” whether it's collaboration, discussion, or just to say hi, feel free to reach out! Email: yuanzhenlong21b@ict.ac.cn


🎯 Research Areas


πŸ”₯ Main News

  • 2026.01 πŸ’Ό Joined LongCat Team, Meituan as a Research Intern, working on the M17 3A Base Model team.
  • 2026.02 πŸŽ‰ Our work ADE-CoT has been Accepted by CVPR 2026.
  • 2026.01 πŸŽ‰ Our work Video-STAR has been Accepted by ICLR 2026.
  • 2026.01 πŸŽ‰ Our work AutoDrive-RΒ² has been Accepted by ICLR 2026.
  • 2025.10 πŸŽ‰ Our work DVP-MVS++ has been Accepted by TCSVT 2025.
  • 2025.08 πŸŽ‰ Our work AutoDrive-RΒ² was reported by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ).
  • 2025.05 πŸ’Ό Joined DreamX-World Team, Alibaba as a Research Intern.
  • 2025.05 πŸŽ‰ Our work SED-MVS has been Accepted by TCSVT 2025.
  • 2024.12 πŸŽ‰ Our work DVP-MVS has been Accepted by AAAI 2025.
  • 2024.12 πŸŽ‰ Our work MSP-MVS has been Accepted by AAAI 2025.
  • 2024.05 πŸŽ‰ Our work TSAR-MVS has been Accepted by PR 2024.
  • 2023.12 πŸŽ‰ Our work SD-MVS has been Accepted by AAAI 2024.


πŸŽ“ Education

πŸŽ“ Academic Background
ICT Logo

Institute of Computing Technology, Chinese Academy of Sciences

Ph.D. in Information and Communication Engineering

πŸ“ Beijing Β· Sep 2021 - Present


πŸ’Ό Professional Experience

πŸ’Ό Industry Experience
DreamX Logo

DreamX-World Team, Alibaba

Research Intern

πŸ“ Beijing Β· Jun 2025 - Jan 2026

LongCat Logo

LongCat Team, Meituan

Research Intern, M17 3A Base Model Team

πŸ“ Beijing Β· Feb 2026 - Present

  • Research and developed LongCat-Next, a technical report on next-generation long-context video understanding for autonomous driving
  • Contributing to the M173A foundation model research and development


πŸ“ Selected Publications

For a complete list of publications, please visit my Google Scholar profile

Note: * denotes equal contribution

πŸ“„ Technical Report 1
Tech Report 2026
LongCat-Next

LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Native Multimodal Any-to-Any Generation Foundation Model
Meituan LongCat Team

[Paper] [Code]

LongCat-Next is a native multimodal model (A3B) that unifies text, vision, and audio under a single autoregressive objective via discrete tokenization, achieving strong performance across multimodal benchmarks.

πŸ€– Vision-Language Models & VLA 4
ICLR 2026
Video-STAR

Video-STAR: Reinforcing Zero-shot Video Understanding with Tools
Tool-Using Agent Multi-turn RL Zero-shot Video
Yuan Z., Qu X., Qian, C., Chen, R., Tang, J., Sun L., Chu X., Zhang D., Wang Y., Cai Y., Li S.

[Paper] [Code]

Video-STAR proposes a novel framework that reinforces zero-shot video understanding through tool-use agents with multi-turn reasoning.

ICLR 2026
AutoDrive-RΒ²

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Multimodal Reasoning Autonomous Driving Vision-Language-Action
Featured by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ)
Yuan Z., Tang, J., Luo, J., Chen, R., Qian, C., Sun, L., Cai Y., Zhang D., Li, S.

[Paper] [Code]

AutoDrive-RΒ² introduces a reasoning and self-reflection framework for Vision-Language-Action models in autonomous driving scenarios.

Preprint
Reasoning-VLA

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
Autonomous Driving Fast VLA Real-time Inference
Zhang D.*, Yuan Z.*, Chen Z., Liao C., Chen Y., Shen F., Zhou Q., Chua T.

[Paper]

Reasoning-VLA presents a fast and general VLA reasoning model optimized for real-time autonomous driving applications.

🎨 Generative Foundation Model 2
CVPR 2026
ADE-CoT

ADE-CoT: Adaptive Diffusion Elicits Chain-of-Thought in Image Editing
Diffusion Model Chain-of-Thought Image Editing
Qu X.*, Yuan Z.*, Tang J., Chen R., Tang D., Yu M., Sun L., Bai Y., Chu X., Gou G., Xiong G., Cai Y.

[Paper]

Preprint
Recovering Degradations

Recovering Degradations with Generative Model: A Consistency-aware Distillation Network for Infrared and Visible Image Fusion
Generation Model Image Fusion Infrared-Visible
Yu H.*, Yuan Z.*, Bai Y., Li J., Liu J., Li S., Sun L., Chu X.

πŸ“ 3D Vision 6
TCSVT 2025
DVP-MVS++

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
Multi-View Stereo 3D Reconstruction
Yuan Z., Zhang, D., Li, Z., Qian, C., Chen, J., Chen, Y., Chen K., Mao T., Li Z., Jiang H., Wang, Z.

[Paper] [Code]

DVP-MVS++ advances multi-view stereo through synergistic depth-normal-edge and visibility prior modeling.

TCSVT 2025
SED-MVS

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint
Segmentation-Driven Depth Estimation
Yuan Z., Yang, Z., Cai, Y., Wu, K., Liu, M., Zhang, D., Jiang H, Li Z., Wang, Z.

[Paper] [Code]

SED-MVS introduces segmentation-driven and edge-aligned deformation for robust multi-view stereo with depth restoration.

AAAI 2025
DVP-MVS

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Visibility Prior 3D Vision
Yuan Z., Luo, J., Shen, F., Li, Z., Liu, C., Mao, T., Wang, Z.

[Paper] [Code]

AAAI 2025
MSP-MVS

MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo
Segmentation Prior Multi-View
Yuan Z., Liu, C., Shen, F., Li, Z., Luo, J., Mao, T., Wang, Z.

[Paper] [Code]

AAAI 2024
SD-MVS

SD-MVS: Segmentation-driven deformation multi-view stereo with spherical refinement and em optimization
Spherical Refinement EM Optimization
Yuan Z., Cao, J., Li, Z., Jiang, H., Wang, Z.

[Paper] [Code]

PR 2024
TSAR-MVS


πŸŽ“ Professional Service

πŸŽ“ Professional Service
πŸ€– AI / Machine Learning
NeurIPS ICML ICLR AAAI
πŸ‘οΈ Computer Vision & Multimodal
CVPR ICCV ECCV
πŸ“° Journals
IJCV TIP TPAMI TMM TNNLS TCSVT PR


πŸŽ™οΈ Talks & Teaching

πŸŽ™οΈ Invited Talks
TBD
Coming soon...


🎯 Hobbies & Interests

πŸ€ Basketball
Enjoy playing basketball in my free time β€” a great way to stay active and unwind.
πŸ‹οΈ Fitness
Regular gym sessions to stay strong and maintain a healthy lifestyle.
🎡 Music
Avid music lover, especially fond of Hamilton β€” the soundtrack never gets old.


πŸ“¬ Let's Connect

πŸ“« Email: yuanzhenlong21b@ict.ac.cn

πŸ’Ό I'm eager to connect with fellow AI researchers and enthusiasts passionate about advancing multimodal AI and embodied intelligence.

πŸ“ Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China