I am Xuanyu Zhang (张轩宇), a Ph.D. Student at the School of Computer Science, Peking University, advised by Prof. Jian Zhang. Previously, I received my B.Eng degree from School of Electrical and Information Engineering, Tianjin University, advised by Prof. Jianjun Lei, Pengfei Zhu, and Bo Peng. Please feel free to reach out via email (xuanyuzhang21@stu.pku.edu.cn).

My recent research interests include MLLM, Visual Reinforcement Learning, AIGC, and Trustworthy AI. I have published 10+ CCF-A papers (as the first/co-first author) with a total Google Scholar citations of 1000+ at top-tier venues including CVPR, NeurIPS, ICLR, AAAI, ACM MM and IJCV, with works spanning united visual understanding and generation, Image or video quality understanding, AIGC forgery localization, multi-agent frameworks, and robust watermarking/steganography for copyright protection. For more details on my publications, please visit my profiles on Google Scholar.

🔥 News

  • 2026.02:  🎉🎉 Two papers are accepted at CVPR 2026.
  • 2026.01:  🎉🎉 Two papers are accepted at ICLR 2026.
  • 2025.11:  🎉🎉 One paper is accepted as AAAI 2026 Oral.

📝 Publications

(# denotes equal contribution)

ICLR 2026 Oral
sym

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

Shijie Zhao#, Xuanyu Zhang#, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, Jian Zhang
ICLR 2026 Oral
Paper | Code
We revisit the reasoning mechanism in MLLM-based IQA model and propose a CLIP-based lightweight image scorer RALI. We verifies that through RL training, MLLMs leverage their reasoning capability to convert redundant visual representations into compact, cross-domain aligned text representations. This conversion is the source of the generalization exhibited by these reasoning-based IQA models. RALI uses only about 4% of Q-Insight’s parameters and inference time, while achieving comparable accuracy.

AAAI 2026 Oral
sym

VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning

Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, Jian Zhang
AAAI 2026 Oral
Paper | Code
We propose VQ-Insight, a novel reasoning-style VLM framework for AIGC video quality assessment. Our approach features: (1) a progressive video quality learning scheme; (2) the design of multi-dimension scoring rewards, preference comparison rewards, and temporal modeling rewards.

NeurIPS 2025 Spotlight
sym

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang and Jian Zhang
NeurIPS 2025 Spotlight
Paper | Code
We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels.

ICLR 2025
sym

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu#, Xuanyu Zhang#, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang
ICLR 2025
Project | Code
We propose the explainable IFDL task and design FakeShield, a multi-modal framework capable of evaluating image authenticity, generating tampered region masks, and providing a judgment basis based on pixel-level and image-level tampering clues.

CVPR 2025
sym

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang
CVPR 2025
Paper | Code
We propose OmniGuard, a novel augmented versatile watermarking approach that integrates proactive embedding with passive, blind extraction for robust copyright protection and tamper localization.

NeurIPS 2024
sym

GS-Hider: Hiding Messages into 3D Gaussian Splatting

Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang
NeurIPS 2024
Paper | Code
We propose the first 3DGS steganography framework GS-Hider, which can hide an entire 3D scene or an image into the original 3D scene and accurately decode it from 3D Gaussians.

CVPR 2024
sym

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
CVPR 2024
Paper | Code
We propose a versatile deep forensic watermark for AIGC editing methods, such as stable diffusion inpaint, controlnet, SDXL and etc.

💻 Internships

  • 2025.04 - present, ByteDance, China.

🎖 Honors and Awards

  • 2025.12 Merit Student at Peking University
  • 2024.10 National Scholarship for Doctoral Students
  • 2022.10 National Scholarship for Undergraduate Students

📖 Educations

  • 2022.09 - 2027.06 (expected), Ph.D., Computer Science, Peking University
  • 2018.09 - 2022.06, B.E., Computer Science (Second Degree), Tianjin University
  • 2018.09 - 2022.06, B.E., Communication Engineering, Tianjin University