[행사/세미나] 인공지능대학원 전문가 초청 세미나 ( Dr. Young Jin Kim @ Microsoft, 12/1(금) 13:30~15:00)
⦁ Title: Multilinguality in Large Language Models: What’s in There and How to Better Utilize
⦁ Speaker: Dr. Young Jin Kim @ Microsoft
⦁ Time : 2023 Dec 1st 13:30 ~ 15:00
⦁ Location: Hybrid
- In-person: 26312
- Online: https://us02web.zoom.us/j/89171293854?pwd=K3hVWTlWcWIxaENXajVwdTFWY0svQT09 (Passcode: 995021)
⦁ Language: English speech & English slides
Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks, yet their performance have predominantly been assessed within the realm of English-specific applications. In this talk, I delve into the multilingual capabilities of LLMs, focusing on the domain of multilingual machine translation.
The initial segment of the talk encompasses a thorough evaluation of the multilingual performance of the GPT models, encompassing factors such as model quality compared to both cutting-edge research and commercial systems. I will delve into the impact of the various prompting strategies, assess robustness against domain shifts, and investigate document-level translation. I will also highlight distinctive characteristics inherent in GPT-based generation.
Transitioning to the second part of the discussion, I will introduce strategies for enhancing existing English-centric LLMs to proficiently handle multilingual tasks, especially for those with moderate model sizes, such as those with 7B or 13B parameters. Despite efforts in previous studies to augment the translation capabilities of these moderate LLMs, their gains have remained limited. In this talk, I will show a novel fine-tuning approach for LLMs eliminating the need for the abundant parallel data that traditional translation models usually depend on. The approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. The method is called Advanced Language Model-based trAnslator (ALMA) and the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT’21 (2 directions) and WMT’22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in multilingual augmentation for LLMs.
Young Jin Kim is a Principal Researcher at Microsoft where he develops machine learning models with state-of-the-art techniques. His recent research focus includes designing efficient and effective algorithms and model architectures for large scale language models. Young received his Ph.D. from Georgia Institute of Technology for his research in deep learning and high-performance computing.