Oddbean new post about | logout
 A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

https://media.wired.com/photos/656e5672fab4cd193a0b3a65/master/pass/A-New-Trick-Uses-AI-to-Jailbreak-AI-Models%E2%80%94Including-GPT-4-Security-GettyImages-1303372363.jpg

Adversarial algorithms can systematically probe large language models like OpenAI’s GPT-4 for weaknesses that can make them misbehave.

https://www.wired.com/story/automated-ai-attack-gpt-4/