Safety / Robustness
APGP
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang
ICML Next Generation of AI Safety Workshop 2024, PDF, Project Page, Code


Croze
Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations
Minseon Kim*, Hyeonjeong Ha*, Sung Ju Hwang
NeurIPS 2023, PDF


Taro
Effective Targeted Attacks for Adversarial Self-Supervised Learning
Minseon Kim, Hyeonjeong Ha, Sooel Son, Sung Ju Hwang
NeurIPS 2023, PDF


Marvl
Few-shot Transferable Robust Representation Learning via Bilevel Attacks
Minseon Kim*, Hyeonjeong Ha*, Dong Bok Lee, Sung Ju Hwang
NeurIPS SafetyML workshop 2022, PDF


ADLM
Language Detoxification with Attribute-Discriminative Latent Space
Minseon Kim*, Jin Myung Kwak*, Sung Ju Hwang
ACL 2023, PDF


EWAT
Rethinking the Entropy of Instance in Adversarial Training
Minseon Kim, Jihoon Tack, Jinwoo Shin, Sung Ju Hwang
SaTML 2023, PDF Code


RoCL
Adversarial Self-Supervised Contrastive Learning
Minseon Kim, Jihoon Tack, Sungju Hwang
NeurIPS 2020, PDF Code