Minseon Kim
Minseon Kim
AI Safety • Robustness • Self-supervised Learning

Minseon Kim PhD @ KAIST

Postdoc Researcher at Microsoft Research–Montréal.

About

I develop methods to identify real‑world safety risks in AI systems and make models more controllable and trustworthy. I’m open to collaborations on AI safety, safety training, and evaluation research.

Keywords

Safety Robustness Self‑Supervised Reasoning

Selected Publications

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

CoLM 2025
M. Kim, J. M. Kwak, L. Alssum, B. Ghanem, P. Torr, D. Krueger, F. Barez, A. Bibi

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

CoLM 2025
S. Y. Arnob, Z. Su, M. Kim, O. Ostapenko, D. Precup, L. Caccia, A. Sordoni

Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings

arXiv 2025
M. Kim*, J.-P. Corbeil*, A. Sordoni, F. Beaulieu, P. Vozila
PDFSafety

Instilling Parallel Reasoning into Language Models

ICML AI for Math WS 2025
M. Macfarlane, M. Kim, N. Jojic, W. Xu, L. Caccia, X. Yuan, W. Zhao, Z. Shi, A. Sordoni

Learning to Solve Complex Problems via Dataset Decomposition

ICML AI for Math WS 2025
W. Zhao, L. Caccia, Z. Shi, M. Kim, X. Yuan, W. Xu, M.-A. Côté, A. Sordoni

Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

arXiv 2025
H. Lee*, M. Kim*, S. Jang, J. Jeong, S. J. Hwang
PDFRobustness

debug-gym: A Text-Based Environment for Interactive Debugging

arXiv 2025
X. Yuan, M. M. Moss, C. El Feghali, C. Singh, D. Moldavskaya, D. MacPhee, L. Caccia, M. Pereira, M. Kim, A. Sordoni, M.-A. Côté

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

ICML Safety WS 2024
M. Kim, H. Lee, B. Gong, H. Zhang, S. J. Hwang

Optimizing Query Generation for Enhanced Document Retrieval in RAG

arXiv 2024
H. Koo, M. Kim, S. J. Hwang

Protein Representation Learning by Capturing Protein Sequence‑Structure‑Function Relationship

ICLR MLGenX WS 2024 (Spotlight)
E. Ko*, S. Lee*, M. Kim*, D. Kim, S. J. Hwang
PDFSSL

Effective Targeted Attacks for Adversarial Self‑Supervised Learning

NeurIPS 2023
M. Kim, H. Ha, S. Son, S. J. Hwang

Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

NeurIPS 2023
H. Ha*, M. Kim*, S. J. Hwang

Language Detoxification with Attribute‑Discriminative Latent Space

ACL 2023
M. Kim*, J. M. Kwak*, S. J. Hwang

Context‑dependent Instruction Tuning for Dialogue Response Generation

arXiv 2023
J. M. Kwak, M. Kim, S. J. Hwang

Meta‑Prediction Model for Distillation‑aware NAS on Unseen Datasets

ICLR 2023 (Spotlight)
H. Lee*, S. An*, M. Kim, S. J. Hwang

Rethinking the Entropy of Instance in Adversarial Training

IEEE SaTML 2023
M. Kim, J. Tack, J. Shin, S. J. Hwang

Lightweight Neural Architecture Search with Parameter Remapping and Knowledge Distillation

AutoML WS 2022
H. Lee*, S. An*, M. Kim, S. J. Hwang

Learning Transferable Adversarial Robust Representations via Multi‑view Consistency

NeurIPS SafetyML WS 2022
M. Kim*, H. Ha*, D. B. Lee, S. J. Hwang

Consistency Regularization for Adversarial Robustness

AAAI 2022
J. Tack, S. Yu, J. Jeong, M. Kim, S. J. Hwang, J. Shin

MRI‑based classification of neuropsychiatric systemic lupus erythematosus patients with self‑supervised contrastive learning

Frontiers in Neuroscience 2022
M. Kim*, F. Inglese*, G. Steup‑Beekman, T. Huizinga, M. Van Buchem, J. Bresser, D. Kim, I. Ronen
PDFSSL

Adversarial Self‑Supervised Contrastive Learning

NeurIPS 2020
M. Kim, J. Tack, S. J. Hwang

Progressive Face Super‑Resolution via Attention to Facial Landmark

BMVC 2019
D. Kim*, M. Kim*, G. Kwon*, D. Kim

T1 Image Synthesis with Deep Convolutional Generative Adversarial Networks

OHBM 2018
M. Kim, C. Han, J. Park, D.-S. Kim

Experience

Postdoctoral Researcher — Microsoft Research–Montréal
Current
Research Internship — ERA–KASL AI Safety Research, University of Oxford
Jun–Aug 2024 • with Philip Torr, David Krueger, Adel Bibi, Fazl Barez
Research Collaboration — Theory Center, Microsoft Research Asia
Jul 2023–May 2024 • with Huishuai Zhang

Invited Talks

Women in MSR – Project Green, Microsoft
Mar. 2025 — “Unsupervised Context Understanding for Safer LLMs”
Tea Talk, Mila (Montréal)
Feb. 2025 — “Designing safety systems for LLM-based services”
RWE AI Journal Club, Microsoft
Nov. 2024 — “How to obtain safety effectively and efficiently”
Guest Lecture, Korea University
May. 2024 — “Automatic Jailbreaking of the Text-to-Image Generative AI Systems”

Academic services

Conference
NeurIPS, ICLR, ICML, ACL, AAAI, ACML, ICCV
Journal
TPAMI, IEEE TNNNLS, TMLR, IEEE T-IFS, IEEE CIM
Organizer

Education

Contact

minseon5113(at)gmail(dot)com