I doubt this happened as explained in the article. How would Claude have access to personal emails. It's a sandboxed AI.
This would mean the devs gave it access to everything to see what would happen. Basically they created a situation where they pretty much knew the outcome in advance anyway.
And surprise, digging through other articles, that's exactly what they did:
"During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.
In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”"
qwop -1 points 6 hours ago
I doubt this happened as explained in the article. How would Claude have access to personal emails. It's a sandboxed AI.
This would mean the devs gave it access to everything to see what would happen. Basically they created a situation where they pretty much knew the outcome in advance anyway.
And surprise, digging through other articles, that's exactly what they did:
"During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.
In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”"