Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability

  1. García-Carrasco, J.
  2. Maté, A.
  3. Trujillo, J.
Aktak:
IJCAI International Joint Conference on Artificial Intelligence

ISSN: 1045-0823

ISBN: 9781956792041

Argitalpen urtea: 2024

Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024

Orrialdeak: 385-393

Mota: Biltzar ekarpena