Yuxiong Wang's Homepage

About Me

Make things as simple as possible, but not simpler

– Albert Einstein

Welcome to my homepage. I am an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC).

My research lies in computer vision, machine learning, and robotics, with a specific focus on open-world perception, meta-learning, multi-modal learning, generative modeling, and agent learning.

Before joining Illinois CS, I was a postdoctoral fellow in the Robotics Institute at Carnegie Mellon University, advised by Prof. Martial Hebert. I obtained my Ph.D. under the supervision of Prof. Martial Hebert in the Robotics Institute. I have also been closely working with Prof. Deva Ramanan and Prof. Ruslan Salakhutdinov. I receive the Best Paper Honorable Mention Award for streaming perception in ECCV 2020, and Best Paper Award Finalist in CVPR 2019, 2022. I am recognized as a Notable Area Chair in ICLR 2023, and an Expert Reviewer in TMLR. I am selected to participate in the National Academy of Engineering’s (NAE) Frontiers of Engineering symposium.

Please feel free to contact me if interested!

  • I am always seeking self-motivated Ph.D., M.S., and undergraduate students. 
  • Our group also has openings for visiting students.

Current Research Topics

  • Meta-learning and learning to learn
  • Open-world, multi-modal, few-shot, and self-supervised learning
  • 3D vision
  • Generative modeling, predictive learning
  • Human motion and human-object interaction modeling
  • Reinforcement learning, robot/agent learning
  • In-the-wild applications in robotics, autonomous driving, agriculture, materials science, chemistry, healthcare, etc.

Dissertation


Talks

One of our primary research efforts focuses on bridging generative and discriminative learning, facilitating autonomous agents to perceive, interact, and act in the open world. For representative works, please refer to my two recent talks: the first talk, presented at the C3.ai Generative AI Workshop, elaborates on how we ground generative modeling in 3D and 4D; the second talk, given at the London Machine Learning Meetup, explores various techniques that leverage LLMs and generative visual models for perception and decision-making.


Students

PhD Students


Recent Publications

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Ziqi Pang*, Tianyuan Zhang*, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang

CVPR, 2025 (Oral, Top 3.3%)

[Website] [PDF] [Code]

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang*, Liang-Yan Gui*

CVPR, 2025 (Highlight)

[Website] [PDF] [Code] [Video]

Floating No More: Object-Ground Reconstruction from a Single Image

Floating No More: Object-Ground Reconstruction from a Single Image

Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

CVPR, 2025

[Website] [PDF]

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang*, Zhiding Yu*

CVPR, 2025

[Website] [Code]

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin*, Xueyang Yu*, Ziqi Pang*, Yu-Xiong Wang

CVPR, 2025

[Website] [Code]

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, Yu-Xiong Wang*, Liang-Yan Gui*

CVPR, 2025

[Website] [Code]

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

Qianlan Yang, Yu-Xiong Wang

ICLR 2025

[PDF]

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing

Jiahua Dong, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Ziqi Pang*, Xu Xin*, Yu-Xiong Wang

ICLR 2025

[Website] [PDF] [Code]

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Shuhong Zheng, Zhipeng Bao, Ruoyu Zhao, Martial Hebert, Yu-Xiong Wang

ICLR 2025

[Website] [PDF]

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

Shengcao Cao, Liang-Yan Gui, Yu-Xiong Wang

arXiv, 2024

[Website] [PDF] [Code]

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Kai Yan, Alex Schwing, Yu-Xiong Wang

NeurIPS, 2024 (Spotlight)

[Website] [PDF] [Code]

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han

NeurIPS, 2024

[Website] [PDF] [Code]

ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing

ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing

Jun-Kun Chen, Yu-Xiong Wang

NeurIPS, 2024

[Website] [PDF]

SceneCraft: Layout-Guided 3D Scene Generation

SceneCraft: Layout-Guided 3D Scene Generation

Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang

NeurIPS, 2024

[Website] [PDF] [Code]

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui

NeurIPS, 2024.

[Website] [PDF]

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning

Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

NeurIPS, 2024.

[Website] [PDF] [Code]

More Publications


Teaching