π About Me
I am currently pursuing my Ph.D. at the Institute of Computing Technology
, Chinese Academy of Sciences
, advised by Prof. Zhaoqi Wang. Concurrently, I serve as a Research Intern at AMAP
, Alibaba
, where I work closely with Xiangxiang Chu. I am deeply grateful for the opportunity to collaborate with exceptional researchers including Prof. Shuo Li, Prof. Yujun Cai, and Prof. Yiwei Wang, as well as Prof. Zhengzhong Tu, Prof. Manling Li, and Prof. LiangLin.
Their mentorship and insights have profoundly shaped my academic journey.
My research interest includes Vision-Language Model (VLM), Large Language Model (LLM), Embodied Agents, Multimodal AI, and 3D Vision. I have published 18+ papers ) at the top international AI conferences such as NeurIPS, ICLR, ICCV, AAAI.
I will be graduating with my Ph.D. in June 2026 at the age of 26 and am now exploring PostDoc opportunities starting Fall 2026. If you are interested in my profile, feel free to contact with me via email (π§ yuanzhenlong21b[at]ict[dot]ac[dot]cn) or WeChat (π§ YZL20000224).
π Research Interests
- Foundation Models & Pre-training π₯π₯
- Vision-Language Models (VLMs) / Vision-Language Action (VLA) / Spatial Intelligence
- Model Enhancement & Post-training π₯π₯
- Reasoning & Alignment / Tool-Augmented RL / NLP-Enhanced Training
- Model Interpretation π₯π₯
- Mechanistic Interpretability / Factuality, Truthfulness, and Social Good
- Real-World Applicationsπ₯π₯
- Embodied Agents / AI for Science / Biomedical Engineering
π₯ Main News
- 2025.10: Β ππ We propose Video-STAR, which is now available on ArXiv!
- 2025.08: Β ππ Our work AutoDrive-RΒ² was reported by AutoDrive Heart (θͺε¨ι©Ύι©ΆδΉεΏ)
- 2025.08: Β ππ We propose AutoDrive-RΒ², which is now available on ArXiv!
- 2025.06: Β ππ We propose DVP-MVS++, which is now available on ArXiv!
- 2025.05: Β ππ Our work SED-MVS has been Accepted by TCSVT 2025.
- 2024.12: Β ππ We propose SED-MVS, which is now available on ArXiv!
- 2024.12: Β ππ Our work DVP-MVS has been Accepted by AAAI 2025.
- 2024.12: Β ππ Our work MSP-MVS has been Accepted by AAAI 2025.
- 2024.08: Β ππ We propose DVP-MVS, which is now available on ArXiv!
- 2024.08: Β ππ We propose MSP-MVS, which is now available on ArXiv!
- 2024.05: Β ππ Our work TSAR-MVS has been Accepted by PR 2024.
- 2024.01: Β ππ We propose TSAR-MVS, which is now available on ArXiv!
- 2023.12: Β ππ Our work SD-MVS has been Accepted by AAAI 2024.
- 2023.09: Β ππ We propose SD-MVS, which is now available on ArXiv!
π Main Publications
Multimodal LLMs Post-Training

Video-STAR: Reinforcing Zero-shot Video Understanding with Tools
Yuan, Z., Qu X., Qian, C., Chen, R., Tang, J., Sun L., Chu X., Zhang D., Wang Y., Cai Y., Li S.

Yuan, Z., Tang, J., Luo, J., Chen, R., Qian, C., Sun, L., Cai Y., Zhang D., Li, S
3D Vision

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
Yuan, Z., Zhang, D., Li, Z., Qian, C., Chen, J., Chen, Y., Chen K., Mao T., Li Z, Jiang H., Wang, Z
IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) (Under Review), 2025.

Yuan, Z., Yang, Z., Cai, Y., Wu, K., Liu, M., Zhang, D., Jiang H, Li Z., Wang, Z.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), 2025.

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Yuan, Z., Luo, J., Shen, F., Li, Z., Liu, C., Mao, T., Wang, Z.
AAAI Conference on Artificial Intelligence (AAAI), 2025.

MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo
Yuan, Z., Liu, C., Shen, F., Li, Z., Luo, J., Mao, T., Wang, Z.
AAAI Conference on Artificial Intelligence (AAAI), 2025.

Yuan, Z., Cao, J., Li, Z., Jiang, H., Wang, Z.
AAAI Conference on Artificial Intelligence (AAAI), 2024.

TSAR-MVS: Textureless-aware segmentation and correlative refinement guided multi-view stereo
Yuan, Z., Cao, J., Wang, Z., Li, Z..
Pattern Recognition (PR), 2024.
π All Publications
PreprintVideo-STAR: Reinforcing Zero-shot Video Understanding with Tools. Z Yuan, X Qu, C Qian, et al.PreprintAutoDrive-R2: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. Z Yuan, J Tang, J Luo, et al.PreprintPure Vision Language Action (VLA) Models: A Comprehensive Survey. D Zhang, J Sun, C Hu, X Wu, Z Yuan, et al.PreprintAT-Drive: Exploiting Adversarial Transfer for End-to-end Autonomous Driving. D Zhang, Z Yuan, K Huang, et al.PreprintADDI: A Simplified E2E Autonomous Driving Model with Distinct Experts and Implicit Interactions. D Zhang, Z Yuan, Chen Y., et al.PreprintEMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning. Y Chen, Y He, J Yang, D Zhang, Z Yuan, et al.PreprintDVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo. Z Yuan, D Zhang, Z Li, et al.NIPS 2025InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection. J Luo, W Ren , Q Zheng, Y Zhang, Z Yuan, et al.IEEE TCSVT 2025Learning multi-view stereo with geometry-aware prior. K Chen, Z Yuan, H Xiao, T Mao, et al.HCII 2025MR-IntelliAssist: A World Cognition Agent Enabling Adaptive Human-AI Symbiosis in Industry 4.0., C Liu, Z Yuan, Y Wang, Y Yin, et al.IEEE TCSVT 2025SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint., Z Yuan, Z Yang, Y Cai, et al.AAAI 2025Dual-level precision edges guided multi-view stereo with accurate planarization., K Chen, Z Yuan, T Mao, et al.AAAI 2025Mapexpert: Online hd map construction with simple and efficient sparse map element expert., D Zhang, D Chen, P Zhi, Y Chen, Z Yuan, et al.AAAI 2025DVP-MVS: Synergize depth-edge and visibility prior for multi-view stereo., Z Yuan, J Luo, F Shen, et al.AAAI 2025MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo., Z Yuan, C Liu, F Shen, et al.PreprintLight4gs: Lightweight compact 4d gaussian splatting generation via context model., M Liu, Q Yang, H Huang, W Huang, Z Yuan, et al.PreprintAdaptive label correction for robust medical image segmentation with noisy labels., C Qian, K Han, J Ding, L Liu, C Lyu, Z Yuan, et al.PreprintDyncim: Dynamic curriculum for imbalanced multimodal learning., C Qian, K Han, J Wang, Z Yuan, et al.PR 2025Nerf-based polarimetric multi-view stereo., J Cao, Z Yuan, T Mao, et al.PR 2024Tsar-mvs: Textureless-aware segmentation and correlative refinement guided multi-view stereo., Z Yuan, J Cao, Z Wang, et al.AAAI 2024Sd-mvs: Segmentation-driven deformation multi-view stereo with spherical refinement and em optimization., Z Yuan, J Cao, Z Li, et al.
π Awards and Service
- 2024.12 Lenovo Enterprise Scholarship (Top 3%)
- 2025.10 ICT National Scholarships (Top 5%)
- Conference Reviewers: NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, AAAI
- Journal Reviewers: IJCV, TIP, TMM, TNNLS, TCSVT, PR