Title: Multimodal Large Language Models and Tunings – Vision, Language, Sensors, Audio, and Beyond
Speaker: Prof. Caren Han @ University of Melbourne
Time: 14:00 - 15:00, June 2nd, 2025
Location: Hybrid
- In-person: 26412 -> 26421
- Online: https://hli.skku.edu/InvitedTalk250602
Language: English speech & English slides
Abstract:
This talk explores recent advancements in multimodal large language models capable of integrating and processing diverse data types such as text, images, audio, and video. Participants will gain a solid understanding of the foundations of multimodality, its evolution, and the technical challenges these models address. This talk cover state-of-the-art multimodal datasets and LLMs, including those extending beyond vision and language, and dive into instruction tuning strategies for task-specific optimisation. This talk is designed to equip researchers, practitioners, and newcomers with the skills to effectively leverage multimodal AI.
Bio:
Caren Han is a Senior Lecturer (equivalent to Associate Professor in the U.S. system) at the University of Melbourne and an Honorary Professor at the University of Sydney, the University of Edinburgh, and POSTECH. Her research focuses on Natural Language Processing (NLP) and Artificial Intelligence, particularly multimodal (visual-linguistic) learning, explainable NLP, sentiment analysis, abusive language detection, dialogue systems, and language understanding. She has led numerous international and national research projects funded by NASA, Google, Thales, Microsoft, Hyundai, the Bank of Korea, and various government agencies in Australia, Korea, and Hong Kong. Her recognitions include Australia Young Achiever (2017), Teacher of the Year (2020), Supervisor of the Year (2021), Early Career Research Award (2023, Physics, Math, and Computing), and the Google Research Award (2024).