One thing that has surprised me a bit recently is how much play the Claude Mythos stuff has gotten in media, especially with the example the New York Times gave of somebody’s kids accidentally bringing down the power grid. In Anthropic’s examples talking about the zero days that these models have identified, it cost $20,000 worth of inference to find these issues. Also it’s not clear to me if this is a step function improvement on GPT-5.4 which researchers have already been using to find these types of bugs in software for months. As Simon Willison noted,
GPT-5.4 already has a strong reputation for finding security vulnerabilities.
As usual, narratives seem quite out of sync.