About
I develop methods to identify real‑world safety risks in AI systems and make models more controllable and trustworthy. I’m open to collaborations on AI safety, safety training, and evaluation research.
Keywords
Safety
Robustness
Self‑Supervised
Reasoning
Selected Publications
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
CoLM 2025S. Y. Arnob, Z. Su, M. Kim, O. Ostapenko, D. Precup, L. Caccia, A. Sordoni
Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings
arXiv 2025M. Kim*, J.-P. Corbeil*, A. Sordoni, F. Beaulieu, P. Vozila
PDFSafety
Instilling Parallel Reasoning into Language Models
ICML AI for Math WS 2025M. Macfarlane, M. Kim, N. Jojic, W. Xu, L. Caccia, X. Yuan, W. Zhao, Z. Shi, A. Sordoni
Learning to Solve Complex Problems via Dataset Decomposition
ICML AI for Math WS 2025W. Zhao, L. Caccia, Z. Shi, M. Kim, X. Yuan, W. Xu, M.-A. Côté, A. Sordoni
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding
arXiv 2025H. Lee*, M. Kim*, S. Jang, J. Jeong, S. J. Hwang
PDFRobustness
debug-gym: A Text-Based Environment for Interactive Debugging
arXiv 2025X. Yuan, M. M. Moss, C. El Feghali, C. Singh, D. Moldavskaya, D. MacPhee, L. Caccia, M. Pereira, M. Kim, A. Sordoni, M.-A. Côté
Optimizing Query Generation for Enhanced Document Retrieval in RAG
arXiv 2024H. Koo, M. Kim, S. J. Hwang
Protein Representation Learning by Capturing Protein Sequence‑Structure‑Function Relationship
ICLR MLGenX WS 2024 (Spotlight)E. Ko*, S. Lee*, M. Kim*, D. Kim, S. J. Hwang
PDFSSL
Language Detoxification with Attribute‑Discriminative Latent Space
ACL 2023M. Kim*, J. M. Kwak*, S. J. Hwang
Context‑dependent Instruction Tuning for Dialogue Response Generation
arXiv 2023J. M. Kwak, M. Kim, S. J. Hwang
Lightweight Neural Architecture Search with Parameter Remapping and Knowledge Distillation
AutoML WS 2022H. Lee*, S. An*, M. Kim, S. J. Hwang
Learning Transferable Adversarial Robust Representations via Multi‑view Consistency
NeurIPS SafetyML WS 2022M. Kim*, H. Ha*, D. B. Lee, S. J. Hwang
MRI‑based classification of neuropsychiatric systemic lupus erythematosus patients with self‑supervised contrastive learning
Frontiers in Neuroscience 2022M. Kim*, F. Inglese*, G. Steup‑Beekman, T. Huizinga, M. Van Buchem, J. Bresser, D. Kim, I. Ronen
PDFSSL
T1 Image Synthesis with Deep Convolutional Generative Adversarial Networks
OHBM 2018M. Kim, C. Han, J. Park, D.-S. Kim
Experience
Postdoctoral Researcher — Microsoft Research–Montréal
Current
Research Internship — ERA–KASL AI Safety Research, University of Oxford
Jun–Aug 2024 • with Philip Torr, David Krueger, Adel Bibi, Fazl Barez
Research Collaboration — Theory Center, Microsoft Research Asia
Jul 2023–May 2024 • with Huishuai Zhang
Invited Talks
Women in MSR – Project Green, Microsoft
Mar. 2025 — “Unsupervised Context Understanding for Safer LLMs”
Tea Talk, Mila (Montréal)
Feb. 2025 — “Designing safety systems for LLM-based services”
RWE AI Journal Club, Microsoft
Nov. 2024 — “How to obtain safety effectively and efficiently”
Guest Lecture, Korea University
May. 2024 — “Automatic Jailbreaking of the Text-to-Image Generative AI Systems”
Academic services
Conference
NeurIPS, ICLR, ICML, ACL, AAAI, ACML, ICCV
Journal
TPAMI, IEEE TNNNLS, TMLR, IEEE T-IFS, IEEE CIM
Organizer
Safety Colloquium (2024), Women in AI/CS/EE at KAIST (2024), Women in AI at KAIST (2022)
Education
- Ph.D., Graduate School of AI, KAIST — Thesis: Towards Safe and Robust Representation with Self‑Supervised Learning (Advisor: Sung Ju Hwang)
- M.S., Electrical Engineering, KAIST — Thesis: Differential representation of face pareidolia (Advisor: Dae‑shik Kim)
- B.S., Bio & Brain Engineering; Computer Science, KAIST
Contact
minseon5113(at)gmail(dot)com