AI: What’s the worst that could happen? (Copy)
Here's a new euphemism from the world of AI for you: 'agentic misalignment'. Better known as e.g. blackmail, espionage, leaking and worse.
In the results of remarkable/alarming* tests published last week by Anthropic, agents of all 16 AI models tested demonstrated a propensity for harmful behaviour when facing threats to its goals or its own existence.
It's important to say that the tests were designed to push models to their limits, that some models were far more inclined to pursue risky behaviour than others, and that Anthropic has no evidence of real-world behaviour of this kind.
But with agentic AI and the ‘infinite workday’ the next big thing, and with brand and personal reputations at stake, it isn’t exactly reassuring that agents could possibly engage in harmful behaviour.
Definitely one to watch.
*delete at applicable