Minseon Kim

About

I develop methods to identify real‑world safety risks in AI systems and make models more controllable and trustworthy.

Recent News!

🏝️ I will visit NeurIPS 2025, San Diego, ping me if you are around!
💡 New papers: LLM jailbreaking, Code agent, Code evaluation
🎤 Talk at the UNIST: “Designing safety systems for LLM-based services.”

Keywords

Safety Robustness Reasoning Self‑Supervised

Selected Publications

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

ArXiv 2025

H. Koo, M. Kim, J. Kim

PDF

Gistify! Codebase-Level Understanding via Runtime Execution

ArXiv 2025

H. Lee, M. Kim, C. Singh, M. Pereira, A. Sonwane, I. White, E. Stengel-Eskin, M. Bansal, Z. Shi, A. Sordoni, M.-A. Côté, X. Yuan, L. Caccia

PDF

BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills

ArXiv 2025

A. Sonwane, I. White, H. Lee, M. Pereira, L. Caccia, M. Kim, Z. Shi, C. Singh, A. Sordoni, M.-A. Côté, X. Yuan

PDF

BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing

NeurIPS 2025

J. Kim, Y. Nam, M. Kim, S. Kim, J. Jeong

PDF Code

Learning to Solve Complex Problems via Dataset Decomposition

NeurIPS 2025

W. Zhao, L. Caccia, Z. Shi, M. Kim, X. Yuan, W. Xu, M.-A. Côté, A. Sordoni

PDF

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

CoLM 2025

M. Kim, J. M. Kwak, L. Alssum, B. Ghanem, P. Torr, D. Krueger, F. Barez, A. Bibi

PDF Code

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

CoLM 2025

S. Y. Arnob, Z. Su, M. Kim, O. Ostapenko, D. Precup, L. Caccia, A. Sordoni

PDF

Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings

arXiv 2025

J.-P. Corbeil*, M. Kim*, A. Sordoni, F. Beaulieu, P. Vozila

PDF

Instilling Parallel Reasoning into Language Models

ICML AI for Math WS 2025

M. Macfarlane, M. Kim, N. Jojic, W. Xu, L. Caccia, X. Yuan, W. Zhao, Z. Shi, A. Sordoni

PDF

Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

arXiv 2025

H. Lee*, M. Kim*, S. Jang, J. Jeong, S. J. Hwang

PDF

debug-gym: A Text-Based Environment for Interactive Debugging

arXiv 2025

X. Yuan, M. M. Moss, C. El Feghali, C. Singh, D. Moldavskaya, D. MacPhee, L. Caccia, M. Pereira, M. Kim, A. Sordoni, M.-A. Côté

PDF

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

ICML Safety WS 2024

M. Kim, H. Lee, B. Gong, H. Zhang, S. J. Hwang

PDF Project Code

Optimizing Query Generation for Enhanced Document Retrieval in RAG

arXiv 2024

H. Koo, M. Kim, S. J. Hwang

PDF

Protein Representation Learning by Capturing Protein Sequence‑Structure‑Function Relationship

ICLR MLGenX WS 2024 (Spotlight)

E. Ko*, S. Lee*, M. Kim*, D. Kim, S. J. Hwang

PDF

Effective Targeted Attacks for Adversarial Self‑Supervised Learning

NeurIPS 2023

M. Kim, H. Ha, S. Son, S. J. Hwang

PDF Code

Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

NeurIPS 2023

H. Ha*, M. Kim*, S. J. Hwang

PDF Code

Language Detoxification with Attribute‑Discriminative Latent Space

ACL 2023

M. Kim*, J. M. Kwak*, S. J. Hwang

PDF

Context‑dependent Instruction Tuning for Dialogue Response Generation

arXiv 2023

J. M. Kwak, M. Kim, S. J. Hwang

PDF

Meta‑Prediction Model for Distillation‑aware NAS on Unseen Datasets

ICLR 2023 (Spotlight)

H. Lee*, S. An*, M. Kim, S. J. Hwang

PDF Code

Rethinking the Entropy of Instance in Adversarial Training

IEEE SaTML 2023

M. Kim, J. Tack, J. Shin, S. J. Hwang

PDF Code

Lightweight Neural Architecture Search with Parameter Remapping and Knowledge Distillation

AutoML WS 2022

H. Lee*, S. An*, M. Kim, S. J. Hwang

PDF

Learning Transferable Adversarial Robust Representations via Multi‑view Consistency

NeurIPS SafetyML WS 2022

M. Kim*, H. Ha*, D. B. Lee, S. J. Hwang

PDF

Consistency Regularization for Adversarial Robustness

AAAI 2022

J. Tack, S. Yu, J. Jeong, M. Kim, S. J. Hwang, J. Shin

PDF Code

MRI‑based classification of neuropsychiatric systemic lupus erythematosus patients with self‑supervised contrastive learning

Frontiers in Neuroscience 2022

M. Kim*, F. Inglese*, G. Steup‑Beekman, T. Huizinga, M. Van Buchem, J. Bresser, D. Kim, I. Ronen

PDF

Adversarial Self‑Supervised Contrastive Learning

NeurIPS 2020

M. Kim, J. Tack, S. J. Hwang

PDF Code

Progressive Face Super‑Resolution via Attention to Facial Landmark

BMVC 2019

D. Kim*, M. Kim*, G. Kwon*, D. Kim

PDF Code

T1 Image Synthesis with Deep Convolutional Generative Adversarial Networks

OHBM 2018

M. Kim, C. Han, J. Park, D.-S. Kim

Experience

Postdoctoral Researcher — Microsoft Research–Montréal

Current

Research Internship — ERA–KASL AI Safety Research, University of Oxford

Jun–Aug 2024 • with Philip Torr, David Krueger, Adel Bibi, Fazl Barez

Research Collaboration — Theory Center, Microsoft Research Asia

Jul 2023–May 2024 • with Huishuai Zhang

Talks

AI Seminar, UNIST

Oct. 2025 — "Designing Safety Systems for LLM-based Services”

Mila X MSR, Microsoft

Oct. 2025 — “Learning to Extract Context for Context-aware LLM Inference”

Women in MSR – Project Green, Microsoft

Mar. 2025 — “Unsupervised Context Understanding for Safer LLMs”

Tea Talk, Mila

Feb. 2025 — “Designing safety systems for LLM-based services”

RWE AI Journal Club, Microsoft

Nov. 2024 — “How to obtain safety effectively and efficiently”

Guest Lecture, Korea University

May. 2024 — “Automatic Jailbreaking of the Text-to-Image Generative AI Systems”

Academic services

Conference

NeurIPS, ICLR, ICML, ACL, AAAI, ACML, ICCV

Journal

TPAMI, IEEE TNNNLS, TMLR, IEEE T-IFS, IEEE CIM

Organizer

WiML @ CoLM (2025), Safety Colloquium (2024), Women in AI/CS/EE at KAIST (2024), Women in AI at KAIST (2022)

Education

Ph.D., Graduate School of AI, KAIST — Thesis: Towards Safe and Robust Representation with Self‑Supervised Learning (Advisor: Sung Ju Hwang)
M.S., Electrical Engineering, KAIST — Thesis: Differential representation of face pareidolia (Advisor: Dae‑shik Kim)
B.S., Bio & Brain Engineering; Computer Science, KAIST

Contact

minseon5113(at)gmail(dot)com