πŸ“ Selected Publications

For a complete list of publications, please visit my Google Scholar profile

πŸ“ˆ View Citation Trend
Citation Trend

Note: * denotes equal contribution

πŸ€– Vision-Language Models & VLA 3
ICLR 2026
Video-STAR

Video-STAR: Reinforcing Zero-shot Video Understanding with Tools πŸ”§ Tool-Using Agent πŸ”„ Multi-turn Agentic RL Yuan Z., Qu X., Qian, C., Chen, R., Tang, J., Sun L., Chu X., Zhang D., Wang Y., Cai Y., Li S.

[Paper] [Code]

Video-STAR proposes a novel framework that reinforces zero-shot video understanding through tool-use agents with multi-turn reasoning.

ICLR 2026
AutoDrive-RΒ²

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving ⚑ Multimodal Reasoning πŸš— Autonomous Driving Featured by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ) Yuan Z., Tang, J., Luo, J., Chen, R., Qian, C., Sun, L., Cai Y., Zhang D., Li, S.

[Paper] [Code]

AutoDrive-RΒ² introduces a reasoning and self-reflection framework for Vision-Language-Action models in autonomous driving scenarios.

Preprint
Reasoning-VLA

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving πŸš— Autonomous Driving ⚑ Fast VLA Zhang D.*, Yuan Z.*, Chen Z., Liao C., Chen Y., Shen F., Zhou Q., Chua T.

[Paper]

Reasoning-VLA presents a fast and general VLA reasoning model optimized for real-time autonomous driving applications.

✨ Diffusion Models 1
CVPR 2026
ADE-CoT

ADE-CoT: … ✨ Diffusion Model ⚑ Chain-of-Thought Yuan Z., et al.

[Paper]

πŸ“ 3D Vision 6
TCSVT 2025
DVP-MVS++

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo πŸ“· Multi-View Stereo πŸ‘οΈ 3D Reconstruction Yuan Z., Zhang, D., Li, Z., Qian, C., Chen, J., Chen, Y., Chen K., Mao T., Li Z., Jiang H., Wang, Z.

[Paper]

DVP-MVS++ advances multi-view stereo through synergistic depth-normal-edge and visibility prior modeling.

TCSVT 2025
SED-MVS

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint πŸ“· Segmentation-Driven πŸ‘οΈ Depth Estimation Yuan Z., Yang, Z., Cai, Y., Wu, K., Liu, M., Zhang, D., Jiang H, Li Z., Wang, Z.

[Paper]

SED-MVS introduces segmentation-driven and edge-aligned deformation for robust multi-view stereo with depth restoration.

AAAI 2025
DVP-MVS

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo πŸ“· Visibility Prior πŸ‘οΈ 3D Vision Yuan Z., Luo, J., Shen, F., Li, Z., Liu, C., Mao, T., Wang, Z.

[Paper] [Code]

AAAI 2025
MSP-MVS

MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo πŸ“· Segmentation Prior πŸ‘οΈ Multi-View Yuan Z., Liu, C., Shen, F., Li, Z., Luo, J., Mao, T., Wang, Z.

[Paper] [Code]

AAAI 2024
SD-MVS

SD-MVS: Segmentation-driven deformation multi-view stereo with spherical refinement and em optimization πŸ“· Spherical Refinement πŸ‘οΈ EM Optimization Yuan Z., Cao, J., Li, Z., Jiang, H., Wang, Z.

[Paper] [Code]

PR 2024
TSAR-MVS

TSAR-MVS: Textureless-aware segmentation and correlative refinement guided multi-view stereo πŸ“· Textureless-Aware πŸ‘οΈ 3D Reconstruction Yuan Z., Cao, J., Wang, Z., Li, Z.

[Paper] [Code]