Claude Mythos Leaked, ARC-AGI-3 Humiliates Every AI, Sora Is Dead

Key AI News, Launches and Materials that went viral

Mar 27, 2026

Hey! Welcome to the latest Creators’ AI Edition.

Anthropic accidentally leaked their next model — and it sounds like nothing we've seen. Claude Mythos scored "dramatically higher" than Opus 4.6 on coding, reasoning, and cybersecurity. Cursor got caught building "their" model on a Chinese open-source base. Sora is dead, taking a $1B Disney deal with it.

ARC-AGI-3 launched and humiliated every frontier AI on the planet — GPT-5.4, Gemini, Claude, Grok all scored below 1% while untrained humans hit 100%

Today we have:

Featured Materials 🎟️
News of the week 🌍
Useful tools ⚒️
Weekly Guides 📕
AI Meme of the Week 🤡
AI Tweet of the Week 🐦
(Bonus) Materials 🎁

Keep your mailbox updated with practical knowledge & key news from the AI industry!

Portkey’s enterprise-grade Gateway is now fully open source + ready for the agentic era

Portkey’s Gateway is built for teams running AI in production, at scale. The newly open-sourced Gateway combines Portkey’s OSS Gateway, SaaS Gateway, and a new MCP Gateway, making AI systems observable, governed, and reliable. A single production layer where every model request and agent action flows through one place. Fork it. Self-host it. Build on it.

The repo is open →

Featured Materials 🎟️

Claude Can Now Control Your Mac 🖥️

Anthropic Claude Computer Lets You Control an iPhone with AI

On March 23, Anthropic launched Computer Use for macOS — and it’s the most significant product move they’ve made since Claude Code. Claude can now point, click, type, and navigate apps on your Mac to complete tasks while you’re away from your desk.

It’s available as a research preview inside Claude Cowork and Claude Code for Pro and Max subscribers.

How it actually works:

Claude follows a clear priority ladder when given a task:

Connectors first — if you have Slack, Google Calendar, or Drive connected, it uses those (fastest, most reliable)
Browser second — if no connector exists, Claude controls Chrome to reach the web version of a tool
Full screen control last — only as a last resort does it take over your mouse, keyboard, and screen directly

The key companion feature is Dispatch — assign Claude a task from your iPhone, and return to finished work on your desktop. Export a pitch deck as PDF, attach it to a calendar invite, start a dev server and screenshot it — all while you’re doing something else.

What’s also new in Claude Code:

Auto mode — classifier-based permissioning lets safe shell commands run autonomously while blocking destructive ones
Planner–generator–evaluator harness — improves long-running coding tasks through structured iteration, QA, and context handling
Anthropic also launched a science blog to formalize AI-driven research workflows. 74 product updates in 52 days.

The real picture:

Anthropic explicitly says reliability is still ~50% on complex tasks and recommends not using it with sensitive data in this preview. But the architecture is right — connector-first, screen-control as fallback, permission-gated at every step. This is Anthropic’s answer to Manus Desktop and OpenClaw, built directly into the model layer.

Available to Claude Pro ($20/mo) and Max ($100–200/mo) subscribers on macOS only for now. Windows is likely next given Cowork already added it in February.

Cursor Launched a “Frontier AI Model.” One Developer Found the Real Name in the API🔍

On March 19, Cursor announced Composer 2 — their “self-developed, frontier-level coding intelligence” — with impressive benchmarks: 61.7% on Terminal-Bench 2.0, beating Claude Opus 4.6 at one-tenth the price.

The narrative lasted less than 24 hours.

Developer @fynnso was debugging and intercepted an API call. The model identifier in the response wasn’t “Composer 2.” It was kimi-k2p5-rl-0317-s515-fast — literally: Kimi K2.5, reinforcement learning, trained March 17, fast serving. He tweeted it. 2.6 million views. Elon Musk replied: “Yeah, it’s Kimi 2.5.”

Why Cursor used Kimi in the first place:

Western open-source options were thin. Meta’s Llama 4 Behemoth is indefinitely delayed. Gemma 3 tops out at 27B parameters. Kimi K2.5 is a 1 trillion parameter MoE with 32B active parameters — 6x more active compute than GPT-OSS 120B, which matters for the kind of long-horizon, multi-file coding tasks Composer 2 is built for.

What this actually means:

The story isn’t about one company’s disclosure failure. It’s about the fact that the best open-source foundations for building serious AI products are currently Chinese. Cursor is a $50B company with $2B ARR. If they’re quietly building on Kimi, other companies are doing it too — they just haven’t been caught yet.

This is the most important story in AI infrastructure this week that isn’t about benchmarks.

Anthropic Accidentally Leaked Their Most Powerful Model Ever. Meet Claude Mythos🔓

On March 26, a configuration error in Anthropic's content management system left nearly 3,000 unpublished assets publicly searchable — including a draft blog post describing a new model called Claude Mythos (internal codename: Capybara). Anthropic confirmed they're testing it with early access customers. Their own words: "a step change" and "the most capable we've built to date."

What the leak revealed:

Mythos sits above Opus in capability tier — not a replacement, a new level entirely. On benchmarks, it posts "dramatically higher scores" than any previous Claude model on coding, academic reasoning, and cybersecurity tasks.

The irony: a company whose entire brand is "responsible AI" accidentally leaked documents warning their own flagship model is an unprecedented cybersecurity risk.

Currently in early access testing. No public release date announced.

News of the week 🌍

Gemini 3.1 Flash Live Launches — And Google Lets You Import Your ChatGPT History 🎙️

On March 26, Google launched Gemini 3.1 Flash Live: a real-time voice model that better handles acoustic nuances like pitch, pace, and emotional cues. It powers Search Live and Gemini Live with multi-language support and lower latency. On the same day, Google added a Gemini import feature — users can now upload zipped chat exports from ChatGPT, Claude, or any other AI directly into Gemini. Google is quietly building the stickiest migration path in the market. You can now bring your entire conversation history with you when you switch.

China Trapped Manus's Founders. Meta Is Still Shipping Their Product. 🚨 On March 25, the FT reported that Chinese authorities have barred Manus CEO Xiao Hong and Chief Scientist Ji Yichao from leaving the country. The two Singapore-based founders were summoned to a Beijing meeting with the National Development and Reform Commission, questioned about potential violations of foreign investment rules around Meta's $2B acquisition of Manus, and told they cannot travel internationally until the review concludes. Beijing is reportedly furious that Manus relocated operations to Singapore without government vetting — and worried other Chinese AI startups will do the same. Worst case scenario being discussed: invalidating the acquisition entirely, though Meta has already integrated Manus's team and is actively shipping Manus Desktop. Meta's response: "The transaction complied fully with applicable law." The situation in one sentence: Meta is launching the product globally while both of its co-founders are stuck in China and can't join the team.

Meta Cuts 700 Jobs. Then Announces $921M in Executive AI Bonuses.

On March 25, Meta eliminated approximately 700 roles — primarily in Reality Labs, recruiting, sales, and Facebook — as part of a strategic refocus on AI. Less than 24 hours earlier, Meta unveiled a new stock program that could pay top executives up to $921 million each over five years to retain AI leadership.

The message from Zuckerberg is clear: the metaverse bet is being unwound. AI is the only bet that matters now.

China Bans Manus Founders From Leaving the Country. Meta's $2B Deal Is Now a Geopolitical Incident 🌐

On March 25, Chinese authorities summoned Manus co-founders Xiao Hong and Ji Yichao to a meeting with Beijing's National Development and Reform Commission — and told them they wouldn't be going anywhere. No charges filed. Just: you're staying.

The trigger: Beijing is reviewing whether Meta's $2 billion acquisition of Manus violated China's foreign investment laws. The core question — whether Manus transferred core AI agent IP to its Singapore entity without required government approval. Advanced AI agents now fall under China's technology export rules.

The "Singapore washing" playbook — relocate HQ offshore, restructure ownership, sidestep both Beijing and Washington — just became significantly riskier. Every Chinese AI founder with an offshore structure is watching this. Every Western VC that backed them too.

CNBC reported this morning: founders and investors are already rethinking exit strategies.

MCP Hits 97 Million Installs — It’s Now Infrastructure

The Model Context Protocol crossed 97 million installs in March 2026. MCP is no longer an experiment — it’s infrastructure. Every major AI provider now ships MCP-compatible tooling. If you’re building anything agentic and not thinking about MCP, you’re building on sand.

ARC-AGI-3 Launched — And Every AI Scored Below 1%

François Chollet’s ARC Prize Foundation launched ARC-AGI-3 on March 25 — the most radical benchmark change since the original in 2019. Previous versions tested static grid puzzles. Version 3 drops an AI agent into an interactive, game-like environment with zero instructions, zero stated goals, and no description of the rules. The agent has to explore, figure out what it’s supposed to do, and execute.

The results were brutal. Frontier models on the new benchmark: Gemini 3.1 Pro — 0.37%, GPT-5.4 — 0.26%, Claude Opus 4.6 — 0.25%, Grok-4.20 — 0.00%. Untrained humans: 100%. A simple CNN doing graph search scored 12.58% — beating every LLM by more than 12 points.

The prize pool is $2M on Kaggle. All winning solutions must be open-sourced. Chollet’s take: “AI can do many things, but it cannot have general intelligence as long as this fundamental divide exists.”

The most important thing about ARC-AGI-3 isn’t the scores. It’s what the benchmark is testing — the ability to learn rules from scratch in an unfamiliar environment. That’s exactly what current AI cannot do. All the benchmark hype of the past year just got a reality check.

Sora Is Gone — And It Took the Disney Deal With It

OpenAI shut down the Sora AI video app, API, and website on March 24. The immediate trigger was legal pushback and the collapse of the $1B Disney deal — Disney had committed to invest and license characters to Sora, but pulled out ahead of the shutdown.

The deeper reason: compute costs per generated minute were economically irreconcilable with any API price users would actually pay. Six months after a hyped launch, OpenAI killed one of its most public consumer bets.

The same week, OpenAI announced advanced talks with Sam Altman-backed Helion Energy to secure fusion power — starting at 12.5% of output and scaling to 5 GW by 2030 and 50 GW by 2035. Altman is stepping down from Helion’s board and recusing himself while retaining his stake. The OpenAI Foundation also launched a $25B health strategy with over $1B near-term for disease research. ChatGPT gained a visual shopping discovery layer with product images, prices, and side-by-side comparisons via the Agentic Commerce Protocol.

Shutting down Sora while simultaneously betting on fusion energy in the same week is a very specific kind of company energy.

Useful tools ⚒️

⭐ Agentation — Today’s #1 on Product Hunt, and for good reason. You annotate any element on your UI — click, type, done — and Agentation generates structured output that Claude Code, Codex, or any AI coding agent can immediately understand and act on. Instead of telling an agent “the blue button in the sidebar,” you give it .sidebar > button.primary and your feedback. Includes MCP integration so your agent can acknowledge, question, or resolve feedback in real-time. Free. Works in any React app.

Suno v5.5 — Released March 26. Suno’s biggest update ever: you can now add your actual singing voice to AI-generated tracks. Upload a short vocal sample, verify your identity, and every song you generate can sound like you singing it. Also new: Custom Models (fine-tune v5.5 on your own catalog) and My Taste (the model learns your genre preferences over time). Voices is for Pro and Premier subscribers ($10+/mo). My Taste is free for everyone.

Voxtral TTS by Mistral — Open-source text-to-speech that supports 9 languages and clones a voice from under 5 seconds of audio. Dropped March 26. Fits on a smartwatch. Direct competition for ElevenLabs and Deepgram — and it’s free to run locally.

Universal CLI by Composio — Connect AI agents to 1,000+ apps directly from your terminal. Dropped this week. Instead of configuring individual MCP servers or API keys per tool, Composio’s CLI handles auth, routing, and tool discovery for your agent in one command. For anyone building agentic workflows, this removes a genuinely annoying setup step.

Codex Plugins by OpenAI — OpenAI launched Codex plugins this week: package Codex skills and app integrations as reusable plugins. 20+ initial integrations including Figma, Notion, Gmail, and Slack. Essentially standardizes repeatable AI workflows so your team doesn’t rebuild the same agent logic from scratch every time.

Thanks for reading Creators' AI! This post is public so feel free to share it.

Weekly Guides 📕

How to Set Up Claude Computer Use on Mac: Full Setup + 8 Workflows — Step-by-step: how to enable Computer Use in Claude Code, the permission requirements, safety checklist, and 8 real workflows worth automating first. Published March 26 by MindStudio.

Claude Cowork Complete Guide: Computer Use, Dispatch, Projects & Connectors — The most thorough Cowork guide available right now. Covers everything from Computer Use setup to Dispatch remote control, Projects, Windows support, 38+ connectors, plugins, scheduled tasks, and power-user tips. Updated March 24.

Introducing Mistral Small 4 — Official Architecture Breakdown — Mistral’s own guide: the MoE setup, the reasoning_effort toggle, benchmark comparisons against GPT-OSS 120B, and deployment options from Hugging Face to NVIDIA NIM. Start here before anything else.

ARC-AGI-3: Interactive Benchmark for Agentic Intelligence — The Paper — The official paper and launch page: what the benchmark actually tests (interactive environments, zero instructions, goal discovery), how to participate in the $2M Kaggle competition, and why LLMs fail so badly. Required reading to understand where current AI genuinely falls short.

Suno v5.5 Voices: How to Add Your Singing Voice to AI Music — Suno’s official launch post walking through Voices setup, voice verification, Custom Models training, and My Taste personalization. Clear practical guide for any creator already using Suno.

AI Meme of the Week 🤡

**CNN stands for Convolutional Neural Network** — a classic deep learning architecture from the early 2010s, not the news channel.

$100B in compute. 0.26%.

AI Tweet of the Week 🐦

Fynn@fynnso

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

Cursor @cursor_ai

Composer 2 is now available in Cursor.

6:59 PM · Mar 19, 2026 · 2.67M Views

285 Replies · 478 Reposts · 7K Likes

(Bonus) Materials 🎁

Top 100 Gen AI Consumer Apps: March 2026 — a16z’s biannual deep dive: ChatGPT at 900M weekly active users, Claude growing paid subscribers 200%+ YoY, and the full breakdown of which categories are winning.

The Full Cursor / Kimi K2.5 Story: What the API Call Revealed — TechCrunch’s complete account: how @fynnso found the model ID, what Cursor admitted, what Kimi K2.5 actually is, and why the Western open-source gap matters.

Anthropic vs. Pentagon: Full Legal Context — The Axios breakdown of why Anthropic sued the DoD, what “supply chain risk” designation actually means, and how Google is quietly winning the military AI contract race.

MCP Standard and Ecosystem in 2026 — Why 97 million installs matters and how to actually build with MCP in production today

If you missed our previous update, don’t worry, here it is:

NVIDIA Bets $1 Trillion, Anthropic Builds a Think Tank | Weekly Digest

Cedric Hsu

May 6

Wow, this article really captures the whiplash of the current AI landscape. The ARC-AGI-3 results are mind-blowing - seeing frontier models score below 1% while untrained humans hit 100% completely reframes what we mean by "intelligence." It's not just about beating benchmarks anymore, but about understanding what those benchmarks actually measure.

The Claude Mythos leak is fascinating too. If it's "dramatically higher" than Opus 4.6 on coding and cybersecurity, that suggests we're entering a new tier of capability that could reshape how we think about AI safety and development. The Sora/Disney fallout shows how quickly even billion-dollar deals can evaporate when the tech landscape shifts this fast.

What strikes me most is how these developments feel less like incremental progress and more like paradigm shifts happening in real-time. It's both exhilarating and a bit unnerving to watch.

Creators' AI

Discussion about this post

Ready for more?