I'm a Ph.D. candidate at Yonsei University CIP Lab, currently advised by Professor Seon Joo Kim.
My research centers on multimodal understanding and generation, with a particular focus on Multimodal Large Language Models (MLLMs). I explore how images and text can be more effectively connected in AI systems, spanning both architectural innovations and evaluative methodologies.
Here are some keywords about my current research interests:
Microsoft Research, AI Frontiers Redmond, U.S.
Research Scientist Intern Summer 2025
LG AI Research, AML Seoul, South Korea
Research Scientist Intern Summer 2024
Naver, Foundational Research Seongnam, South Korea
Research Scientist Intern Summer 2023
Yonsei University Seoul, South Korea
Ph.D. Student Mar. 2024 - Present
Advisor: Seon Joo Kim
Seoul National University Seoul, South Korea
M.Sc. Student Mar. 2020 - Feb. 2023
Advisor: Gunhee Kim
Most recent publications on Google Scholar.
‡ indicates equal contribution.
What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?
Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet
arXiv. 2025.
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu
arXiv. 2025.
Teaching Metric Distance to Discrete Autoregressive Language Models
Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu
arXiv. 2025.
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu
ACL. 2025.
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu
ACL. 2025.
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu
AAAI. 2025.
Towards Visual Text Design Transfer Across Languages
Jiwan Chung‡, Yejin Choi‡, Sumin Shim, Giyeong Oh, Youngjae Yu
NeurIPS Datasets and Benchmarks. 2024.
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu
EMNLP. 2024.
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu
EMNLP. 2024.
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung, Youngjae Yu
EMNLP. 2023.
Fusing pre-trained language models with multimodal prompts through reinforcement learning
Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi
CVPR. 2023.
ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning
Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song
ICCV. 2021.
Transitional adaptation of pretrained models for visual storytelling
Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim
CVPR. 2021.
What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?
Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet
arXiv. 2025.
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu
arXiv. 2025.
Teaching Metric Distance to Discrete Autoregressive Language Models
Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu
arXiv. 2025.
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu
EMNLP. 2025.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu
ICCV. 2025.
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu
ACL. 2025.
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu
ACL. 2025.
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu
ICRA. 2025.
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
Junhyeok Kim, Min Soo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu
NAACL Findings. 2025.
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu
NAACL Findings. 2025.
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu
AAAI. 2025.
Towards Visual Text Design Transfer Across Languages
Jiwan Chung‡, Yejin Choi‡, Sumin Shim, Giyeong Oh, Youngjae Yu
NeurIPS Datasets and Benchmarks. 2024.
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu
EMNLP. 2024.
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu
EMNLP. 2024.
Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models
Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo
EMNLP. 2024.
HyperCLOVA X Technical Report
HyperCLOVA X Team
arXiv. 2024.
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung, Youngjae Yu
BMVC. 2023.
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung, Youngjae Yu
EMNLP. 2023.
Reading books is great, but not if you are driving! Visually grounded reasoning about defeasible commonsense norms
Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu
EMNLP. 2023.
Fusing pre-trained language models with multimodal prompts through reinforcement learning
Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi
CVPR. 2023.
ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning
Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song
ICCV. 2021.
Transitional adaptation of pretrained models for visual storytelling
Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim
CVPR. 2021.
Character grounding and re-identification in story of videos and text descriptions
Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim
ECCV. 2020.
Full Resume in PDF.