Jiwan Chung

Ph.D. Candidate, Yonsei University

jiwan.chung.research@gmail.com

Bio

I'm a Ph.D. candidate at Yonsei University CIP Lab, currently advised by Professor Seon Joo Kim.

My research centers on multimodal understanding and generation, with a particular focus on Multimodal Large Language Models (MLLMs). I explore how images and text can be more effectively connected in AI systems, spanning both architectural innovations and evaluative methodologies.

Here are some keywords about my current research interests:

Research Experience

Microsoft Research, AI Frontiers Redmond, U.S.

Research Scientist Intern Summer 2025

LG AI Research, AML Seoul, South Korea

Research Scientist Intern Summer 2024

Naver, Foundational Research Seongnam, South Korea

Research Scientist Intern Summer 2023

Yonsei University Seoul, South Korea

Ph.D. Student Mar. 2024 - Present

Advisor: Seon Joo Kim

Seoul National University Seoul, South Korea

M.Sc. Student Mar. 2020 - Feb. 2023

Advisor: Gunhee Kim

Publications

Most recent publications on Google Scholar.
indicates equal contribution.

What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?

Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet

arXiv. 2025.

v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning

Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu

arXiv. 2025.

Teaching Metric Distance to Discrete Autoregressive Language Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

arXiv. 2025.

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

ACL. 2025.

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu

ACL. 2025.

MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI. 2025.

Towards Visual Text Design Transfer Across Languages

Jiwan Chung‡, Yejin Choi‡, Sumin Shim, Giyeong Oh, Youngjae Yu

NeurIPS Datasets and Benchmarks. 2024.

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

EMNLP. 2024.

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu

EMNLP. 2024.

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Jiwan Chung, Youngjae Yu

EMNLP. 2023.

Fusing pre-trained language models with multimodal prompts through reinforcement learning

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

CVPR. 2023.

ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning

Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

ICCV. 2021.

Transitional adaptation of pretrained models for visual storytelling

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim

CVPR. 2021.

What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?

Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet

arXiv. 2025.

v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning

Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu

arXiv. 2025.

Teaching Metric Distance to Discrete Autoregressive Language Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

arXiv. 2025.

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu

EMNLP. 2025.

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu

ICCV. 2025.

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

ACL. 2025.

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu

ACL. 2025.

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu

ICRA. 2025.

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild

Junhyeok Kim, Min Soo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu

NAACL Findings. 2025.

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

NAACL Findings. 2025.

MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI. 2025.

Towards Visual Text Design Transfer Across Languages

Jiwan Chung‡, Yejin Choi‡, Sumin Shim, Giyeong Oh, Youngjae Yu

NeurIPS Datasets and Benchmarks. 2024.

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

EMNLP. 2024.

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu

EMNLP. 2024.

Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models

Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

EMNLP. 2024.

HyperCLOVA X Technical Report

HyperCLOVA X Team

arXiv. 2024.

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Jiwan Chung, Youngjae Yu

BMVC. 2023.

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Jiwan Chung, Youngjae Yu

EMNLP. 2023.

Reading books is great, but not if you are driving! Visually grounded reasoning about defeasible commonsense norms

Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

EMNLP. 2023.

Fusing pre-trained language models with multimodal prompts through reinforcement learning

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

CVPR. 2023.

ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning

Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

ICCV. 2021.

Transitional adaptation of pretrained models for visual storytelling

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim

CVPR. 2021.

Character grounding and re-identification in story of videos and text descriptions

Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim

ECCV. 2020.

Vitæ

Full Resume in PDF.