Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability

Actas:

IJCAI International Joint Conference on Artificial Intelligence

ISSN: 1045-0823

ISBN: 9781956792041

Año de publicación: 2024

Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024

Páginas: 385-393

Tipo: Aportación congreso

GOOGLE SCHOLAR

Fuente de los datos: Scopus