(S21-CS 598) Advanced Computer Vision: Schedule

Schedule (Tentative)

We will typically cover two papers in each class.

Date Presenter Topic Papers Slides
Jan 25 Yuxiong Wang Introduction
Jan 27 Yuxiong Wang Teaser: A Fundamental Challenge in Computer Vision Y.-X. Wang and M. Hebert. Learning to learn: Model regression networks for easy small sample learning. ECCV, 2016.

Y.-X. Wang, D. Ramanan, and M. Hebert,. Learning to model the tail. NeurIPS, 2017.

Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. Low-shot learning from imaginary data. CVPR, 2018.

L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. M. F. Moura. Few-shot human motion prediction via meta-learning. ECCV, 2018.
Feb 1 Amnon Attali Teaser: Vision for X (Robotics) S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. JMLR, 2016.

S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin. Combining optimal control and learning for visual navigation in novel environments. CoRL, 2019.
Part I: From 2D to 3D Computer Vision
Feb 3 Object Detection M. Tan, R. Pang, and Q. V. Le. EfficientDet: Scalable and efficient object detection. CVPR, 2020.

H. Law and J. Deng. CornerNet: Detecting objects as paired keypoints. ECCV, 2018.
Feb 8 Image Segmentation A. Kirillov, Y. Wu, K. He, and R. Girshick. PointRend: Image segmentation as rendering. CVPR, 2020.

X. Chen, R. Girshick, K. He, and P. Dollár. TensorMask: A foundation for dense object segmentation. ICCV, 2019.
Feb 10 Human Pose Estimation Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. TPAMI, 2019.

R. A. Güler, N. Neverova, and I. Kokkinos. DensePose: Dense human pose estimation in the wild. CVPR, 2018.
Feb 15 Optical and Scene Flow Z. Teed and J. Deng. RAFT: Recurrent all-pairs field transforms for optical flow. ECCV, 2020.

G. Yang and D. Ramanan. Learning to segment rigid motions from two frames. arXiv, 2021.
Feb 17 No Class Break
Feb 22 Video Recognition C. Feichtenhofer, H. Fan, J. Malik, and K. He. SlowFast networks for video recognition. ICCV, 2019.

J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. CVPR, 2017.
Feb 24 Visual Context & Correspondence A. Jabri, A. Owens, and A. Efros. Space-time correspondence as a contrastive random walk. NeurIPS, 2020.

X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. CVPR, 2018.
March 1 3D Point Clouds C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. CVPR, 2017.

Y. Zhou and O. Tuzel. VoxelNet: End-to-end learning for point cloud based 3D object detection. CVPR, 2018.
March 3 2.5D + 3D Image Understanding T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. CVPR, 2017.

S. Zuffi, A. Kanazawa, and M. J. Black. Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. CVPR, 2018.
Part II: Towards Versatile Computer Vision Systems
March 8 Human Object Interaction T. Gupta, A. Schwing, and D. Hoiem. No-frills human-object interaction detection: Factorization, layout encodings, and training techniques. ICCV, 2019.

G. Gkioxari, R. Girshick, P. Dollár, and K. He. Detecting and recognizing human-object interactions. CVPR, 2018.
March 10 Relational Reasoning R. Girdhar and D. Ramanan. CATER: A diagnostic dataset for compositional actions and temporal reasoning. ICLR, 2020.

A. Santoro, D. Raposo, D. G.T. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. Lillicrap. A simple neural network module for relational reasoning. NeurIPS, 2017.
March 15 Generative Models X. Luo, X. Zhang, P. Yoo, R. Martin-Brualla, J. Lawrence, and S. M. Seitz. Time-travel rephotography. arXiv, 2020.

OpenAI. DALL·E: Creating images from text. 2021.
March 17 Video Prediction L.-Y. Gui, Y.-X. Wang, X. Liang, and J. M.F. Moura. Adversarial geometry-aware human motion prediction. ECCV, 2018.

T. Xue, J. Wu, K. L. Bouman, and W. T. Freeman. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. NeurIPS, 2016.
March 22 Vision and Language A. Agrawal, J. Lu, S. Antol, M. Mitchell, C. L. Zitnick, D. Batra, and D. Parikh. VQA: Visual question answering. ICCV, 2015.

C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid. VideoBERT: A Joint Model for Video and Language Representation Learning. ICCV, 2019.
March 24 No Class Break
March 29 Embodied Vision & Multi-Modality Perception F. Xia, A. Zamir, Z.-Y. He, A. Sax, J. Malik, and S. Savarese. Gibson env: Real-world perception for embodied agents. CVPR, 2018.

A. Owens and A. A. Efros. Audio-visual scene analysis with self-supervised multisensory features. ECCV, 2018.
March 31 Mengtian (Martin) Li CMU Guest Lecture
Part III: Frontiers
April 5 Transformers A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. NeurIPS, 2017.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
April 7 Jiajun Wu Stanford Guest Lecture Class time will be changed to 12:15-1:30pm CST
April 12 Transformers for Object Detection and Tracking X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai. Deformable DETR: Deformable transformers for end-to-end object detection. ICLR, 2021.

T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer. TrackFormer: Multi-object tracking with transformers. arXiv, 2021.
April 14 Neural Radiance Fields (NeRF) B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020.

L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.-Y. Lin. iNeRF: Inverting neural radiance fields for pose estimation. arXiv, 2020.
Part IV: Generalization & Reducing Human Supervision in Computer Vision
April 19 Domain Adaptation & Transfer Learning & Multi-task Learning I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross-stitch networks for multi-task learning. CVPR, 2016.

C. P. Phoo and B. Hariharan. Self-training for few-shot transfer across extreme task differences. ICLR, 2021.
April 21 Few/Zero-Shot Learning Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola. Rethinking few-shot image classification: a good embedding is all you need?ECCV, 2020.

OpenAI. CLIP: Connecting text and images. 2021.
April 26 Few-Shot Synthesis T. R. Shaham, T. Dekel, and T. Michaeli. SinGAN: Learning a generative model from a single natural image. ICCV, 2019.

A. Yu, V. Ye, M. Tancik, and A. Kanazawa. pixelNeRF: Neural radiance fields from one or few images. arXiv, 2020.
April 28 Ishan Misra FAIR Guest Lecture
May 3 Semi-Supervised Learning & Data Augmentation H. Pham, Z. Dai, Q. Xie, M.-T. Luong, and Q. V. Le. Meta pseudo labels. arXiv, 2020.

G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph.Simple copy-paste is a strong data augmentation method for instance segmentation. arXiv, 2020.
May 5 Open-World Recognition & Long-Tail Recognition & Continual Learning A. Bendale and T. Boult. Towards open world recognition. CVPR, 2015.

R. Aljundi, K. Kelchtermans, and T. Tuytelaars. Task-free continual learning. CVPR, 2019.
May 13 Final Project Presentations 8:00 AM - 11:00 AM (CT)