Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
ISSN: 1045-0823
ISBN: 9781956792041
Argitalpen urtea: 2024
Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Orrialdeak: 385-393
Mota: Biltzar ekarpena