📝 Selected Publications

For a complete list of publications, please visit my Google Scholar profile

📈 View Citation Trend

Note: * denotes equal contribution

🤖 Vision-Language Models & VLA 3

ICLR 2026

Video-STAR: Reinforcing Zero-shot Video Understanding with Tools 🔧 Tool-Using Agent 🔄 Multi-turn Agentic RL Yuan Z., Qu X., Qian, C., Chen, R., Tang, J., Sun L., Chu X., Zhang D., Wang Y., Cai Y., Li S.

Video-STAR proposes a novel framework that reinforces zero-shot video understanding through tool-use agents with multi-turn reasoning.

ICLR 2026

AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving ⚡ Multimodal Reasoning 🚗 Autonomous Driving Featured by AutoDrive Heart (自动驾驶之心) Yuan Z., Tang, J., Luo, J., Chen, R., Qian, C., Sun, L., Cai Y., Zhang D., Li, S.

AutoDrive-R² introduces a reasoning and self-reflection framework for Vision-Language-Action models in autonomous driving scenarios.

Preprint

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving 🚗 Autonomous Driving ⚡ Fast VLA Zhang D.*, Yuan Z.*, Chen Z., Liao C., Chen Y., Shen F., Zhou Q., Chua T.

Reasoning-VLA presents a fast and general VLA reasoning model optimized for real-time autonomous driving applications.

✨ Diffusion Models 1

CVPR 2026

ADE-CoT: … ✨ Diffusion Model ⚡ Chain-of-Thought Yuan Z., et al.

📐 3D Vision 6

TCSVT 2025

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo 📷 Multi-View Stereo 👁️ 3D Reconstruction Yuan Z., Zhang, D., Li, Z., Qian, C., Chen, J., Chen, Y., Chen K., Mao T., Li Z., Jiang H., Wang, Z.

DVP-MVS++ advances multi-view stereo through synergistic depth-normal-edge and visibility prior modeling.

TCSVT 2025

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint 📷 Segmentation-Driven 👁️ Depth Estimation Yuan Z., Yang, Z., Cai, Y., Wu, K., Liu, M., Zhang, D., Jiang H, Li Z., Wang, Z.

SED-MVS introduces segmentation-driven and edge-aligned deformation for robust multi-view stereo with depth restoration.

AAAI 2025

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo 📷 Visibility Prior 👁️ 3D Vision Yuan Z., Luo, J., Shen, F., Li, Z., Liu, C., Mao, T., Wang, Z.

AAAI 2025

MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo 📷 Segmentation Prior 👁️ Multi-View Yuan Z., Liu, C., Shen, F., Li, Z., Luo, J., Mao, T., Wang, Z.

AAAI 2024

PR 2024