-일시 : 4/14(월) 13:30~ (1시간 내외)
-장소 : 산학협력센터 7층 85718호
-연사 : 서울대학교 데이터사이언스대학원 이준석 교수님
-강의제목 : Multimodal image and video understanding on various applications
-강의요약 : In this talk, we will first overview the definition of multimodal learning in modern AI, followed by several recent interesting applications. First, we will cover referring image segmentation, a task to predict a segmentation mask of the referred object given an image and a text. A simple data augmentation technique turns out to be powerful on this task with promising outcomes. Second, we will talk about video summarization, a task to select important frames or clips from a video, to fully summarize the entire content or to detect interesting parts of it. We present a recent large-scale summarization dataset and promising results when pre-trained on them. Third, we will talk about recent generation and editing models for images and videos, focusing on the characteristics of the latent space learned by diffusion models. We present a work to further improve the nature of the space using isometric regularization. If time permits, we will briefly discuss how these video understanding techniques can be applied to video recommendations.