Jiwan Chung

Ph.D. Candidate, Yonsei University

Bio

I'm a Ph.D. candidate at Yonsei University CIP Lab, advised by Professor Seon Joo Kim.

My research focuses on Multimodal Large Language Models (MLLMs), exploring how images and text can be more effectively connected in AI systems.

Current interests:

Research Experience

Microsoft Research, AI Frontiers Redmond, U.S.

Research Scientist Intern Summer 2025

LG AI Research, AML Seoul, South Korea

Research Scientist Intern Summer 2024

Naver, Foundational Research Seongnam, South Korea

Research Scientist Intern Summer 2023

Yonsei University Seoul, South Korea

Ph.D. Student Mar. 2024 - Present

Advisor: Seon Joo Kim

Seoul National University Seoul, South Korea

M.Sc. Student Mar. 2020 - Feb. 2023

Advisor: Gunhee Kim

Research Keywords

Click a keyword to filter publications below.

Publications

Research outputs across venues. = equal contribution. Full list on Scholar

2020 All Years 2026
2025
What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?

Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet

arXiv. 2025.

multimodal reasoning perception geometry benchmark
2025
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning

Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu

arXiv. 2025.

visual grounding reasoning VLM pointing
2025
Teaching Metric Distance to Discrete Autoregressive Language Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

arXiv. 2025.

training distance learning multimodal robotics
2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

ACL. 2025.

any-to-any consistency multimodal evaluation
2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu

ACL. 2025.

nonverbal dialogue video dataset
2025
MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI. 2025.

image-text matching bias VLM
2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

EMNLP. 2024. (Oral)

visual reasoning arguments benchmark
2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation

Jiwan Chung, Youngjae Yu

EMNLP. 2023.

VLM decoding generation
2023
Fusing pre-trained language models with multimodal prompts through reinforcement learning

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

CVPR. 2023.

multimodal RL prompting VLM
2021
ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning

Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

ICCV. 2021.

audio-visual dataset video representation
2021
Transitional adaptation of pretrained models for visual storytelling

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim

CVPR. 2021.

visual storytelling generation VLM
2026
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

Jaewoo Park, Jungyang Park, Dongju Jang, Jiwan Chung, Byungwoo Yoo, Jaewoo Shin, Seonjoon Park, Taehyeong Kim, Youngjae Yu

AAAI. 2026.

visual grounding math reasoning education benchmark
2025
What MLLMs Learn When They Learn Multimodal Reasoning: Perception, Reasoning, or Integration?

Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet

arXiv. 2025.

multimodal reasoning perception geometry benchmark
2025
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning

Jiwan Chung‡, Junhyeok Kim‡, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu

arXiv. 2025.

visual grounding reasoning VLM pointing
2025
Teaching Metric Distance to Discrete Autoregressive Language Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

arXiv. 2025.

training distance learning multimodal robotics
2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu

EMNLP. 2025.

agents exploration decision-making benchmark
2025
VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu

ICCV. 2025.

ambiguity VLM disambiguation benchmark
2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

ACL. 2025.

any-to-any consistency multimodal evaluation
2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Jiwan Chung‡, Youngmin Kim‡, Jisoo Kim, sunghyun lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu

ACL. 2025.

nonverbal dialogue video dataset
2025
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu

ICRA. 2025.

robotics navigation commonsense embodied AI
2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild

Junhyeok Kim, Min Soo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu

NAACL Findings. 2025.

egocentric dialogue agents video
2025
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

NAACL Findings. 2025.

LLM personality evaluation benchmark
2025
MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI. 2025.

image-text matching bias VLM
2024
Towards Visual Text Design Transfer Across Languages

Jiwan Chung‡, Yejin Choi‡, Sumin Shim, Giyeong Oh, Youngjae Yu

NeurIPS Datasets and Benchmarks. 2024.

visual text multilingual generation benchmark
2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Jiwan Chung‡, Sungjae Lee‡, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

EMNLP. 2024. (Oral)

visual reasoning arguments benchmark
2024
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu

EMNLP. 2024.

puns ambiguity VLM benchmark
2024
Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models

Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

EMNLP. 2024.

LLM reasoning code prompting
2024
HyperCLOVA X Technical Report

HyperCLOVA X Team

arXiv. 2024.

LLM Korean multilingual
2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Jiwan Chung, Youngjae Yu

BMVC. 2023.

video QA summarization long video
2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation

Jiwan Chung, Youngjae Yu

EMNLP. 2023.

VLM decoding generation
2023
Reading books is great, but not if you are driving! Visually grounded reasoning about defeasible commonsense norms

Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

EMNLP. 2023. (Oral)

commonsense norms VLM benchmark
2023
Fusing pre-trained language models with multimodal prompts through reinforcement learning

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

CVPR. 2023.

multimodal RL prompting VLM
2021
ACAV100M: Automatic curation of large-scale datasets for audio-visual video representation learning

Jiwan Chung‡, Sangho Lee‡, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

ICCV. 2021.

audio-visual dataset video representation
2021
Transitional adaptation of pretrained models for visual storytelling

Jiwan Chung‡, Youngjae Yu‡, Heeseung Yun, Jongseok Kim, Gunhee Kim

CVPR. 2021.

visual storytelling generation VLM
2020
Character grounding and re-identification in story of videos and text descriptions

Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung, Gunhee Kim

ECCV. 2020.

video grounding characters story
bibtex

  

Vitæ

Full Resume in PDF.