Recent Publications

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Ziqi Pang*, Tianyuan Zhang*, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang

CVPR, 2025 (Oral, Top 3.3%)

[Website] [PDF] [Code]

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang*, Liang-Yan Gui*

CVPR, 2025 (Highlight)

[Website] [PDF] [Code] [Video]

Floating No More: Object-Ground Reconstruction from a Single Image

Floating No More: Object-Ground Reconstruction from a Single Image

Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

CVPR, 2025

[Website] [PDF]

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang*, Zhiding Yu*

CVPR, 2025

[Website] [Code]

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin*, Xueyang Yu*, Ziqi Pang*, Yu-Xiong Wang

CVPR, 2025

[Website] [Code]

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, Yu-Xiong Wang*, Liang-Yan Gui*

CVPR, 2025

[Website] [Code]

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

Qianlan Yang, Yu-Xiong Wang

ICLR 2025

[PDF]

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing

Jiahua Dong, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Ziqi Pang*, Xu Xin*, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Shuhong Zheng, Zhipeng Bao, Ruoyu Zhao, Martial Hebert, Yu-Xiong Wang

ICLR 2025

[Website] [PDF]

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

Shengcao Cao, Liang-Yan Gui, Yu-Xiong Wang

arXiv, 2024

[Website] [PDF] [Code]

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Kai Yan, Alex Schwing, Yu-Xiong Wang

NeurIPS, 2024 (Spotlight)

[Website] [PDF] [Code]

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han

NeurIPS, 2024

[Website] [PDF] [Code]

ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing

ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing

Jun-Kun Chen, Yu-Xiong Wang

NeurIPS, 2024

[Website] [PDF]

SceneCraft: Layout-Guided 3D Scene Generation

SceneCraft: Layout-Guided 3D Scene Generation

Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang

NeurIPS, 2024

[Website] [PDF] [Code]

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui

NeurIPS, 2024.

[Website] [PDF]

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning

Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

NeurIPS, 2024.

[Website] [PDF] [Code]

More Publications