Anthropic Claude Model Safety

News

Time on MSN12d

Exclusive: New Claude Model Triggers Stricter Safeguards at Anthropic

Anthropic has long been warning about these risks—so much so that in 2023, the company pledged to not release certain models ...

Anthropic Future-Proofs New AI Model With Rigorous Safety Rules

Anthropic’s AI Safety Level 3 protections add a filter and limited outbound traffic to prevent anyone from stealing the ...

12don MSN

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

A third-party research institute Anthropic partnered with to test Claude Opus 4 recommended against deploying an early ...

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

The internet freaked out after Anthropic revealed that Claude attempts to report “immoral” activity to authorities under ...

When your LLM calls the cops: Claude 4’s whistle-blow and the new agentic AI risk stack

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 ...

InfoQ11h

Anthropic Introduces Claude 4 Family and Claude Code

Anthropic released Claude Opus 4 and Sonnet 4, the newest versions of their Claude series of LLMs. Both models support ...

CNET on MSN11d

What's New in Anthropic's Claude 4 Gen AI Models?

The latest versions of Anthropic's Claude generative AI models made their debut Thursday, including a heavier-duty model ...

10don MSN

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported ...

12don MSN

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

Anthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.

7don MSN

Anthropic’s new AI model threatened to reveal engineer’s affair to avoid being shut down

In a fictional scenario set up to test Claude Opus 4, the model often resorted to blackmail when threatened with being ...

12d

Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It

When Anthropic’s older Claude model played Pokémon Red, it spent “dozens of hours” stuck in one city and had trouble ...

12d

Anthropic's latest Claude AI models are here - and you can try one for free today

Anthropic says Claude Opus 4 is its most powerful model and the best coding model in the world, while Sonnet 4 is replacing ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results