(S21-CS 598) Advanced Computer Vision: Schedule

Schedule (Tentative)

We will typically cover two papers in each class.

Date	Presenter	Topic	Papers	Slides
Jan 25	Yuxiong Wang	Introduction
Jan 27	Yuxiong Wang	Teaser: A Fundamental Challenge in Computer Vision	Y.-X. Wang and M. Hebert. Learning to learn: Model regression networks for easy small sample learning. ECCV, 2016. Y.-X. Wang, D. Ramanan, and M. Hebert,. Learning to model the tail. NeurIPS, 2017. Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. Low-shot learning from imaginary data. CVPR, 2018. L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. M. F. Moura. Few-shot human motion prediction via meta-learning. ECCV, 2018.
Feb 1	Amnon Attali	Teaser: Vision for X (Robotics)	S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. JMLR, 2016. S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin. Combining optimal control and learning for visual navigation in novel environments. CoRL, 2019.
Part I: From 2D to 3D Computer Vision
Feb 3		Object Detection	M. Tan, R. Pang, and Q. V. Le. EfficientDet: Scalable and efficient object detection. CVPR, 2020. H. Law and J. Deng. CornerNet: Detecting objects as paired keypoints. ECCV, 2018.
Feb 8		Image Segmentation	A. Kirillov, Y. Wu, K. He, and R. Girshick. PointRend: Image segmentation as rendering. CVPR, 2020. X. Chen, R. Girshick, K. He, and P. Dollár. TensorMask: A foundation for dense object segmentation. ICCV, 2019.
Feb 10		Human Pose Estimation	Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. TPAMI, 2019. R. A. Güler, N. Neverova, and I. Kokkinos. DensePose: Dense human pose estimation in the wild. CVPR, 2018.
Feb 15		Optical and Scene Flow	Z. Teed and J. Deng. RAFT: Recurrent all-pairs field transforms for optical flow. ECCV, 2020. G. Yang and D. Ramanan. Learning to segment rigid motions from two frames. arXiv, 2021.
Feb 17	No Class	Break
Feb 22		Video Recognition	C. Feichtenhofer, H. Fan, J. Malik, and K. He. SlowFast networks for video recognition. ICCV, 2019. J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. CVPR, 2017.
Feb 24		Visual Context & Correspondence	A. Jabri, A. Owens, and A. Efros. Space-time correspondence as a contrastive random walk. NeurIPS, 2020. X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. CVPR, 2018.
March 1		3D Point Clouds	C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. CVPR, 2017. Y. Zhou and O. Tuzel. VoxelNet: End-to-end learning for point cloud based 3D object detection. CVPR, 2018.
March 3		2.5D + 3D Image Understanding	T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. CVPR, 2017. S. Zuffi, A. Kanazawa, and M. J. Black. Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. CVPR, 2018.
Part II: Towards Versatile Computer Vision Systems
March 8		Human Object Interaction	T. Gupta, A. Schwing, and D. Hoiem. No-frills human-object interaction detection: Factorization, layout encodings, and training techniques. ICCV, 2019. G. Gkioxari, R. Girshick, P. Dollár, and K. He. Detecting and recognizing human-object interactions. CVPR, 2018.
March 10		Relational Reasoning	R. Girdhar and D. Ramanan. CATER: A diagnostic dataset for compositional actions and temporal reasoning. ICLR, 2020. A. Santoro, D. Raposo, D. G.T. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. Lillicrap. A simple neural network module for relational reasoning. NeurIPS, 2017.
March 15		Generative Models	X. Luo, X. Zhang, P. Yoo, R. Martin-Brualla, J. Lawrence, and S. M. Seitz. Time-travel rephotography. arXiv, 2020. OpenAI. DALL·E: Creating images from text. 2021.
March 17		Video Prediction	L.-Y. Gui, Y.-X. Wang, X. Liang, and J. M.F. Moura. Adversarial geometry-aware human motion prediction. ECCV, 2018. T. Xue, J. Wu, K. L. Bouman, and W. T. Freeman. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. NeurIPS, 2016.
March 22		Vision and Language	A. Agrawal, J. Lu, S. Antol, M. Mitchell, C. L. Zitnick, D. Batra, and D. Parikh. VQA: Visual question answering. ICCV, 2015. C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid. VideoBERT: A Joint Model for Video and Language Representation Learning. ICCV, 2019.
March 24	No Class	Break
March 29		Embodied Vision & Multi-Modality Perception	F. Xia, A. Zamir, Z.-Y. He, A. Sax, J. Malik, and S. Savarese. Gibson env: Real-world perception for embodied agents. CVPR, 2018. A. Owens and A. A. Efros. Audio-visual scene analysis with self-supervised multisensory features. ECCV, 2018.
March 31	Mengtian (Martin) Li CMU	Guest Lecture
Part III: Frontiers
April 5		Transformers	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. NeurIPS, 2017. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
April 7	Jiajun Wu Stanford	Guest Lecture	Class time will be changed to 12:15-1:30pm CST
April 12		Transformers for Object Detection and Tracking	X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai. Deformable DETR: Deformable transformers for end-to-end object detection. ICLR, 2021. T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer. TrackFormer: Multi-object tracking with transformers. arXiv, 2021.
April 14		Neural Radiance Fields (NeRF)	B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020. L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.-Y. Lin. iNeRF: Inverting neural radiance fields for pose estimation. arXiv, 2020.
Part IV: Generalization & Reducing Human Supervision in Computer Vision
April 19		Domain Adaptation & Transfer Learning & Multi-task Learning	I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross-stitch networks for multi-task learning. CVPR, 2016. C. P. Phoo and B. Hariharan. Self-training for few-shot transfer across extreme task differences. ICLR, 2021.
April 21		Few/Zero-Shot Learning	Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola. Rethinking few-shot image classification: a good embedding is all you need?ECCV, 2020. OpenAI. CLIP: Connecting text and images. 2021.
April 26		Few-Shot Synthesis	T. R. Shaham, T. Dekel, and T. Michaeli. SinGAN: Learning a generative model from a single natural image. ICCV, 2019. A. Yu, V. Ye, M. Tancik, and A. Kanazawa. pixelNeRF: Neural radiance fields from one or few images. arXiv, 2020.
April 28	Ishan Misra FAIR	Guest Lecture
May 3		Semi-Supervised Learning & Data Augmentation	H. Pham, Z. Dai, Q. Xie, M.-T. Luong, and Q. V. Le. Meta pseudo labels. arXiv, 2020. G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph.Simple copy-paste is a strong data augmentation method for instance segmentation. arXiv, 2020.
May 5		Open-World Recognition & Long-Tail Recognition & Continual Learning	A. Bendale and T. Boult. Towards open world recognition. CVPR, 2015. R. Aljundi, K. Kelchtermans, and T. Tuytelaars. Task-free continual learning. CVPR, 2019.
May 13		Final Project Presentations	8:00 AM - 11:00 AM (CT)