FRESH Hacker News
Home
Refusal in Language Models Is Mediated by a Single Direction
113 points by fagnerbrack