This project is about our efforts on understanding the content and meaning of static images. A wide range of topics are covered, including but not limited to semantic segmentation (assigning a semantic category to every pixel in an image) [1], image captioning (generating a sentence describing the entire image or a specific region) [2], edge detection, depth estimation etc. In doing so, we also collect large-scale datasets [3] with detailed annotation to facilitate computer vision research. We mainly study mid-level to high-level computer vision problems, with possible connection and extension to natural language understanding. We build models with structure, 3D, and interpretability in mind, and test on challenging real-world images. Our long-term goal is holistic, human-like understanding of objects and scenes.
[1] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, PAMI 2017.
[2] Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, Deep captioning with multimodal recurrent neural networks (m-rnn), ICLR 2015.
[3] Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan L. Yuille, The role of context for object detection and semantic segmentation in the wild, CVPR 2014.