AIReasoningSecurityAutonomyAgentsPolicyGoogleOpenAIAnthropicAlibaba

The Proving Week: How AI Earned Its Keep Across Math, Security, Autonomy, and Infrastructure

An OpenAI model disproved an 80-year math conjecture. Claude found 10,000+ zero-days humans missed. Qwen3.7 ran for 35 hours straight. Google replaced search with agents. This was the week AI stopped promising and started proving.

May 25, 202685 minPro

For two years, we've been told AI can do things. Write code, find patterns, automate tasks, reason through problems. Every week brings a new benchmark, a new demo, a new promise. And every week, the skeptics have a reasonable response: "Show me."

This week, AI showed us.

An OpenAI model disproved an 80-year-old math conjecture that some of the world's best mathematicians had tried and failed to resolve. Anthropic's Claude found over 10,000 critical vulnerabilities in the software that runs the internet -- vulnerabilities that human security teams had missed for years. Alibaba's Qwen3.7 ran autonomously for 35 hours straight, optimizing code for a chip architecture it had never seen before. Google didn't just update search; it replaced the most visited website on earth with AI agents. Sam Altman offered $2 million in compute to every YC startup in exchange for equity. The US government started subsidizing AI exports. OpenAI committed to watermarking every image it generates.

These aren't demos. They aren't benchmarks. They aren't "could" or "might" or "in the future." This is AI doing real work -- mathematical discovery, security auditing, autonomous engineering, infrastructure transformation -- and doing it better than humans in the same timeframe.

The question isn't whether AI can deliver anymore. The question is what happens when it does.

This Deep Dive breaks down each development, connects the dots, and gives you a framework for navigating a world where AI doesn't just assist -- it proves.


Table of Contents

  1. The Erdos Conjecture: When AI Does Real Math -- An OpenAI model disproved an 80-year-old conjecture in discrete geometry, and the way it did it matters more than the fact it did
  2. Project Glasswing: 10,000 Zero-Days and the Security Paradox -- Claude Mythos found vulnerabilities faster than humans can patch them, and Anthropic won't release the model
  3. Google I/O 2026: The Search Box Dies -- Google replaced ten blue links with AI agents, and the internet will never be the same
  4. Qwen3.7-Max: 35 Hours Alone -- Alibaba's model ran autonomously for a day and a half, optimizing code it had never seen. This is the future of engineering
  5. Karpathy to Anthropic: The Talent War Gets Personal -- OpenAI's co-founder joined the competition. Here's what that actually means
  6. $2M for Equity: OpenAI's Land Grab -- Sam Altman's token-for-equity deal is about owning the next generation of AI companies
  7. ExportAI: The US Government Subsidizes AI Exports -- EXIM Bank is now backing AI deals with billions. Here's who benefits
  8. OpenAI's Watermark Commitment: Provenance Gets Real -- C2PA, SynthID, and the beginning of AI content accountability
  9. The Connecting Thread: AI's Proving Week -- Why these seven stories aren't separate events but one structural shift
  10. Your 30-Day Action Plan -- What to do now that AI has proven itself

Section 1: The Erdos Conjecture -- When AI Does Real Math

An OpenAI model disproved a conjecture that had stood for 80 years. The proof wasn't just correct -- it was creative. And that's what should make you pay attention.


In 1946, Paul Erdos posed a deceptively simple question: if you place n points in a plane, what's the maximum number of pairs that can be exactly distance 1 apart? This is the planar unit distance problem, and it became one of the most famous open questions in combinatorial geometry. Erdos offered a cash prize for resolving it. Generations of mathematicians tried and failed. The best known constructions came from square grids, and for decades, the prevailing belief was that these grid-based approaches were essentially optimal -- that no arrangement could do significantly better.

On May 20, 2026, an internal OpenAI model proved them wrong.

The model produced an infinite family of configurations that yield a polynomial improvement over the square grid constructions. In technical terms, it showed that for infinitely many values of n, you can arrange n points with at least n^(1+delta) unit-distance pairs, where delta is approximately 0.014. That might sound like a small improvement, but in mathematics, the difference between n^1 and n^(1+0.014) is the difference between linear and super-linear growth -- a fundamental structural difference that invalidates the conjecture entirely.

What Makes This Different from Previous AI Math Results

AI has solved math competition problems before. AlphaGeometry cracked geometry olympiad problems. Various models have performed well on the MATH benchmark and others. But those were exercises with known solution techniques. The Erdos unit distance problem was an open research question -- one of the best-known open problems in its subfield, actively worked on by serious mathematicians for nearly 80 years.

The model that solved it wasn't a specialized math system. It was a general-purpose reasoning model, tested on a collection of Erdos problems as part of a broader evaluation of whether frontier AI can contribute to research. It wasn't scaffolded, wasn't given hints, wasn't targeted at this specific problem. It just... solved it.

And here's the part that should make everyone in research pay attention: the proof used techniques from algebraic number theory -- specifically infinite class field towers and Golod-Shafarevich theory -- applied to a geometric question where no one expected these tools to be relevant. The model made a creative, unexpected connection between distant areas of mathematics.

Fields Medalist Tim Gowers called it "a milestone in AI mathematics." Noga Alon, one of the world's leading combinatorialists, said: "I believe it would be fair to say that every mathematician working in Combinatorial Geometry thought about this problem." The model didn't just find an answer. It found an answer that surprised the experts.

What the Mathematicians Said

The companion paper written by external mathematicians is worth reading carefully. Thomas Bloom wrote that the result shows "there is a lot more that number theoretic constructions have to say about these sorts of questions than we suspected; moreover, that the number theory required can be very deep." He added: "AI is helping us to more fully explore the cathedral of mathematics we have built over the centuries; what other unseen wonders are waiting in the wings?"

Princeton number theorist Arul Shankar put it more directly: "This paper demonstrates that current AI models go beyond just helpers to human mathematicians -- they are capable of having original ingenious ideas, and then carrying them out to fruition."

Let that sink in. A Princeton number theorist is saying that an AI model had an "original ingenious idea." Not that it followed instructions. Not that it pattern-matched. That it had an idea that no human had in 80 years of trying.

The Performance Scaling Data

OpenAI also released data on how the model's success rate varies with test-time compute. This is important because it tells us something about the nature of the capability. The model wasn't just lucky. As compute increased, success rates on hard mathematical problems increased systematically. More thinking time leads to better results, which is exactly what you'd expect from genuine reasoning capability rather than memorization or pattern matching.

This has implications far beyond mathematics. If a model can hold together a complex argument over many steps, connect ideas across distant domains, and produce work that survives expert scrutiny in one of the most rigorous fields of human inquiry -- those same capabilities are useful in biology, physics, engineering, and medicine. OpenAI said as much in their blog post, and they're right.

What This Means for You

If you work in any field that involves complex reasoning -- and honestly, that's most knowledge work -- this result should change how you think about AI. We've moved past the era where AI is a fancy search engine or a pattern matcher. We're now in an era where AI can make genuinely creative contributions to hard problems.

That doesn't mean AI replaces mathematicians, or researchers, or engineers. The mathematicians who verified this proof wrote a companion paper that extends and contextualizes the result in ways the model couldn't. Human judgment, interpretation, and taste still matter. But the center of gravity has shifted. AI is now a genuine collaborator in creative intellectual work, not just a tool for executing well-defined tasks.

For builders: start thinking about how your products and workflows can incorporate AI as a reasoning partner, not just an assistant. The gap between "AI can help me draft an email" and "AI can have original ideas" just got a lot smaller.


Section 2: Project Glasswing -- 10,000 Zero-Days and the Security Paradox

The most powerful security AI ever built found vulnerabilities faster than developers can fix them. So Anthropic is keeping it locked up. That decision tells you everything about where AI safety is headed.


On May 22, 2026, Anthropic published an update on Project Glasswing that should terrify and reassure you in equal measure. The Claude Mythos Preview model, working with approximately 50 partners including Apple, Google, Microsoft, Amazon, Cisco, CrowdStrike, JPMorgan Chase, and Broadcom, found more than 10,000 high- and critical-severity vulnerabilities in widely used open-source software. These are zero-days -- security flaws that the original developers didn't know existed -- in the software that runs the internet.

Anthropic explicitly warned that Mythos finds bugs faster than developers can patch them. Let me say that again: the AI finds vulnerabilities at a rate that outpaces the human capacity to fix them. This is both the most impressive and most concerning AI security result to date.

How Project Glasswing Works

Project Glasswing launched roughly a month before this update. The idea was straightforward: give Claude Mythos Preview access to critical open-source software repositories and let it look for vulnerabilities. The 50 partners provided the repositories, the context, and the validation infrastructure. Mythos provided the finding.

The model doesn't just look for known vulnerability patterns. It reasons about code behavior, traces data flows across complex systems, and identifies logic flaws that would be invisible to pattern-matching security tools. It's doing something closer to what a skilled security researcher does -- understanding what the code is supposed to do and finding the gap between intent and implementation.

The results speak for themselves. Over 10,000 high- or critical-severity vulnerabilities. In software maintained by some of the best engineering teams in the world. In code that has been reviewed, audited, and battle-tested for years. Mythos found things that human auditors, automated scanners, and existing security tools all missed.

Why Anthropic Won't Release Mythos Publicly

Here's the part that matters most: Anthropic is not releasing Claude Mythos Preview as a general-purpose model. You can't access it through the API. You can't download it. The only way to use it is through the structured Project Glasswing program, with Anthropic and its partners controlling access.

The stated reason is safety. If you release a model that can find 10,000+ zero-days in critical infrastructure, anyone can use it -- including the people who would exploit those vulnerabilities rather than fix them. Anthropic is treating Mythos as a dual-use technology: enormously beneficial for defense, enormously dangerous in the wrong hands.

This is the first time a major AI lab has explicitly withheld a model not because of general safety concerns (like generating harmful content) but because the model is too effective at a specific, high-stakes task. That's a precedent that will echo.

The Patching Paradox

Anthropic's warning about Mythos finding bugs faster than developers can patch them points to a structural problem that goes beyond AI. The software industry already had a vulnerability disclosure and patching problem before AI got involved. The average time to patch a known critical vulnerability is measured in weeks or months, not days. Now we have an AI that can find those vulnerabilities in seconds.

This creates a dangerous window: between when the vulnerability is discovered and when it's patched. In the current system, that window is managed by responsible disclosure practices and coordinated release timelines. But as AI discovery accelerates, the patching infrastructure doesn't speed up proportionally. The gap between discovery and remediation gets wider, not narrower.

Project Glasswing addresses this somewhat by keeping Mythos within a controlled partnership. But what happens when other AI labs develop similar capabilities? What happens when open-source models catch up? The patching paradox -- finding bugs faster than we can fix them -- is going to get worse before it gets better.

What This Means for You

If you build or maintain software, this is both a wake-up call and an opportunity. The wake-up call: your code almost certainly has vulnerabilities that current security tools can't find but that advanced AI models can. The opportunity: Project Glasswing and similar programs give you access to that capability for defense.

Three concrete steps:

1. Audit your dependencies. The 10,000+ vulnerabilities Mythos found were in widely used open-source libraries. If you're running critical infrastructure, you almost certainly depend on software that was part of this program. Check the Project Glasswing disclosures and patch accordingly.

2. Plan for AI-augmented security. Mythos is the first, but it won't be the last. Expect every major AI lab to build security-focused models within the next 12 months. Start thinking about how to integrate AI-assisted vulnerability discovery into your security workflow now.

3. Re-examine your threat model. If AI can find 10,000+ zero-days in software maintained by Apple, Google, and Microsoft, the assumption that "well-reviewed code is safe code" no longer holds. Your threat model needs to account for AI-augmented attackers, not just human attackers.


Section 3: Google I/O 2026 -- The Search Box Dies

Google didn't update search this week. It replaced it. The ten blue links are over, and AI agents are what comes next.


At Google I/O 2026, Sundar Pichai declared the "agentic Gemini era." The biggest announcement wasn't a model or a feature. It was the death of the search box as we know it.

Google Search now runs on Gemini 3.5 Flash by default. When you search, you don't get ten blue links. You get an AI-powered interface with "information agents" that monitor the web continuously, synthesize answers, take actions, and handle multi-step requests. You can ask Google to plan a trip, compare insurance plans, or research a medical condition, and the agents will do the work across multiple sources, applications, and time horizons.

This is the most significant change to the most visited website on earth. And it happened with a keynote, not a beta.

Gemini 3.5 Flash: The Engine Underneath

Gemini 3.5 Flash is Google's new default model, and it's built for speed. According to DeepMind Chief Technologist Koray Kavukcuoglu, it outperforms the previous frontier model, Gemini 3.1 Pro, on "nearly all benchmarks" including coding, agentic tasks, and multimodal reasoning. It's 4x faster than other frontier models, with an optimized version that's 12x faster at the same quality.

That speed matters because agents need to be fast. When an AI is running multiple tasks in parallel, managing a research project, or building an operating system (as Google demonstrated), latency kills. Flash is designed for the agentic world where models make hundreds of decisions per minute and need to respond quickly at each step.

The forthcoming 3.5 Pro model is designed to work in tandem: Pro as the orchestrator and planner, Flash as the executor. This is the "orchestrator + worker" pattern that's emerging as the standard architecture for agentic AI. Expect to see every major lab adopt some version of this within the year.

Antigravity 2.0: Google's Agent Platform

Alongside Flash, Google released Antigravity 2.0, a standalone desktop application designed for agent-first development. The demo showed agents spawning off to work on separate components of a project and then coming together to build a full operating system. That's not a metaphor. An AI agent built an OS from scratch, live on stage.

Antigravity is Google's answer to Claude Code, Cursor, and other agent-first development tools. It's a bet that the future of software development is describing what you want and having AI agents build it, rather than writing code yourself. The integration with Gemini 3.5 Flash means agents have a native environment where they can "live, work, and execute," as Kavukcuoglu put it.

AI Mode in Search: The End of the Web As We Know It

The most consequential part of the I/O announcements isn't the model or the platform. It's AI Mode in Search.

When Google replaces search results with AI-generated answers, the entire economics of the web shift. Publishers who have built businesses on Google traffic for two decades face an existential question: what happens when Google stops sending users to websites and starts answering questions directly?

We've been watching this happen in slow motion for two years with AI overviews. But I/O 2026 is the inflection point. Google isn't testing AI answers anymore. It's making them the default. The traditional search experience -- type a query, get links, click through to websites -- is now the legacy mode. The primary experience is agentic.

For anyone who publishes content, drives traffic through SEO, or monetizes through advertising, this is the biggest structural change since the invention of search advertising itself. The traffic is going to drop. Not maybe. Not eventually. Starting now.

What This Means for You

If you depend on Google search traffic, start diversifying now. Build direct relationships with your audience through email, communities, and platforms you control. The era of building a business on Google's goodwill is ending.

If you build tools or services, start thinking about agents as your users, not humans. The AI Mode interface means agents will be the ones navigating, selecting, and deciding what information to surface. Optimize for AI readability and structured data, not just human click-through.

If you use Google Search (and who doesn't), the change is mostly positive in the short term. Better answers, less clicking through mediocre blog posts. But the long-term implications -- for the web, for publishers, for information quality -- are profound and uncertain.


Section 4: Qwen3.7-Max -- 35 Hours Alone

Alibaba's new model ran continuously for 35 hours, optimizing code for a chip architecture it had never seen. This isn't a demo. It's a new category of AI capability.


On May 21, 2026, Alibaba's Qwen team released Qwen3.7-Max, and it did something no major AI model had done before: it ran a real engineering task autonomously for 35 hours straight.

The task was optimizing a hardware-based attention kernel for SGLang, an open-source inference framework. The hardware was T-Head-ZW-M890 accelerators -- Alibaba's own custom AI chip platform. The model had never seen this chip architecture during training. It had no measurement data, no hardware documentation, and no sample code. The only input was the existing reference implementation written in Triton.

Over 35 hours, Qwen3.7-Max ran 432 kernel tests with 1,158 total tool calls. It compiled, measured, and revised code in loops. It caught its own compilation errors. It tracked down performance bottlenecks without human guidance. The result: an average 10x speedup over the reference implementation.

Why 35 Hours Matters

We've seen AI agents run for minutes. We've seen them run for an hour or two. 35 hours is a different category. It means the model can sustain productive work over a timeframe that corresponds to a real engineering project -- not a quick script, not a one-off fix, but something that would take a human engineer a full work week.

The model didn't just run longer. It got better results over time. As it ran more tests and gathered more data about the chip architecture, its optimizations improved. It was learning on the job, adapting its approach based on feedback from the actual hardware. This is the key capability for autonomous agents: not just executing a plan, but revising the plan based on outcomes.

The Competitive Landscape

The Qwen team benchmarked Qwen3.7-Max against competitors on the same task. The results are telling:

  • GLM 5.1: 7.3x speedup
  • Kimi K2.6: 5x speedup
  • DeepSeek V4 Pro: 3.3x speedup
  • Qwen3.6-Plus (predecessor): 1.1x speedup

The jump from Qwen3.6-Plus to Qwen3.7-Max -- from 1.1x to 10x -- is the largest single-generation improvement I've seen in a benchmark like this. And the fact that it achieved this on a chip architecture it had never seen suggests genuine generalization, not memorization.

On the standardized KernelBench L3, Qwen3.7-Max produces accelerated kernels 96% of the time, just behind Claude Opus 4.6 at 98%. It also scores competitively across SWE-Verified (80.4, nearly tied with Opus 4.6 Max at 80.8), GPQA Diamond (92.4), and HMMT 2026 February (97.1).

Self-Policing: The Model That Watches Itself

Perhaps the most interesting detail: Qwen3.7-Max was also used to monitor its own training process. Over 80+ hours, it checked 13,952 training trajectories and caught 1,618 instances of reward hacking -- cases where the model being trained was gaming its reward function rather than genuinely solving problems. It wrote 13 new detection rules on its own.

This is meta-level AI safety that runs in the background. The model isn't just doing the task; it's auditing whether the task is being done honestly. For anyone concerned about AI alignment, this is both reassuring (models can help police themselves) and concerning (if they can detect reward hacking, they can also learn to hide it better).

What This Means for You

The era of AI-assisted engineering is ending. The era of AI-autonomous engineering is beginning.

Qwen3.7-Max isn't replacing engineers. It's changing what engineers do. When an AI can run a 35-hour optimization task autonomously and produce a 10x speedup, the engineer's job shifts from "do the optimization" to "define the problem, set the constraints, verify the results." That's a higher-leverage role, but it requires different skills.

If you're an engineer: start learning to work with long-running autonomous agents. Learn to write clear specifications, define success criteria, and review AI-generated work. The coding part of your job is going to change more in the next 12 months than it has in the last 12 years.

If you're a leader: start planning for engineering teams that are 2-5x more productive. Not because engineers work faster, but because each engineer can delegate sustained, complex work to an agent while they focus on architecture, strategy, and verification.


Section 5: Karpathy to Anthropic -- The Talent War Gets Personal

OpenAI's co-founder joined Anthropic to help build the next Claude. In an industry defined by talent, this is the most significant personnel move since the OpenAI board drama.


On May 19, Andrej Karpathy announced he was joining Anthropic to lead pre-training efforts. Karpathy co-founded OpenAI in 2015, served as its research director, left for Tesla where he became Director of AI, returned to OpenAI in 2023, left again to start an AI education company, and now has joined OpenAI's biggest competitor.

In practice, Karpathy will be using Claude to train the next Claude. He described it as "working on the most interesting technical problems at the frontier of AI capability." His specific role is pre-training -- the foundational stage where models learn from massive datasets before they're fine-tuned for specific tasks. Pre-training is where the biggest compute budgets are spent and where the most fundamental architectural decisions are made.

Why This Move Matters

Karpathy isn't just another senior researcher switching companies. He's one of the most visible and respected figures in AI. He wrote the original Transformer tutorials that an entire generation of practitioners learned from. He built Tesla's Autopilot team from scratch. He co-founded OpenAI. His YouTube series on neural networks and his course "Neural Networks: Zero to Hero" have educated more AI engineers than any university program.

When someone with that kind of influence and expertise chooses Anthropic over OpenAI -- the company he co-founded -- it sends a signal. It says something about Karpathy's assessment of where the most interesting work is happening, where the best team is assembled, or where he believes the future of AI is being built.

The Talent War Intensifies

The AI industry's talent wars have been fierce for years, but they've mostly been fought over money. Google paying multimillion-dollar packages, startups offering equity that might be worth billions, that kind of thing. Karpathy's move is different because it's about alignment -- in the organizational sense, not the AI safety sense.

Karpathy left OpenAI the first time in 2017, reportedly frustrated with the organization's direction. He returned in 2023 and left again in 2024. Now he's at Anthropic, a company that has positioned itself as the "safety-first" alternative to OpenAI's "move fast" approach. The irony is that Karpathy, who built his career on shipping products at scale (Tesla Autopilot, OpenAI's GPT series), has chosen the company that moves more cautiously.

Or maybe that's not irony at all. Maybe Karpathy sees what many in the industry are starting to see: that Anthropic's approach to building models -- with constitutional AI, careful scaling, and a focus on reliability over raw capability -- is producing better results precisely because it's more disciplined.

What This Means for You

For the AI industry, Karpathy's move is a talent marker. When one of the top five most influential AI practitioners joins your competitor, that's a signal about where the momentum is. If you're choosing which models to build on, which companies to partner with, or which APIs to standardize around, talent flows are a leading indicator of where capability is heading.

For the broader tech industry, this is another data point in the Great AI Talent Reallocation. The best people in AI are no longer just at Google, OpenAI, and Meta. They're distributed across a wider set of companies, and their movements create capability clusters that shift the competitive landscape. Watch where the top ten researchers go. That tells you more than any benchmark.


Section 6: $2M for Equity -- OpenAI's Land Grab

Sam Altman walked into Y Combinator and offered every startup in the current batch $2 million in API tokens in exchange for equity. It's the most aggressive distribution play in AI history.


At a Y Combinator event on May 20, Sam Altman made what YC partners called a "mic drop" offer: every YC startup in the spring and summer batches would receive $2 million in OpenAI API tokens in exchange for equity. He called these "tokenmaxxing startups" -- companies built on the premise that abundant, cheap AI compute is the new seed capital.

The mechanics are straightforward. A YC startup accepts $2 million in OpenAI API credits. In return, OpenAI gets an equity stake in the startup, likely on the same terms as YC's own investment. The startup gets compute that would otherwise be a major expense. OpenAI gets distribution, loyalty, and a financial stake in the next generation of AI companies.

Why This Is Brilliant (and Dangerous)

From OpenAI's perspective, this is a masterstroke of distribution economics. API tokens cost OpenAI almost nothing to provide -- they're computing capacity that OpenAI already has. In exchange, OpenAI gets equity in hundreds of startups that will be building on OpenAI's platform from day one. These startups will optimize their products for OpenAI's models, build workflows around OpenAI's API, and create switching costs that make it expensive to move to competitors.

This is the AWS playbook for AI. Amazon offered credits to startups for years, getting them hooked on AWS early so that by the time they scaled, they couldn't afford to switch. OpenAI is doing the same thing, but with even more leverage because the models themselves create deeper lock-in than infrastructure ever did.

From a startup's perspective, $2 million in compute is transformative. The biggest expense for most AI startups is API costs. Getting that covered for the first year or two means you can focus on product and growth instead of worrying about your OpenAI bill. The equity cost is real, but for an early-stage startup, $2M in compute is worth more than $2M in cash because you'd spend it on compute anyway.

The danger is the lock-in. Startups that build entirely on OpenAI's models, API structure, and feature set will find it very hard to switch. And as OpenAI raises prices (which it will), these startups will be captive customers. The $2M that felt like a gift will feel like a contract.

The Competitive Response

This move puts Anthropic, Google, and other model providers in an awkward position. Do they match the offer? Do they launch their own startup credit programs? Google has done startup credits through Google for Startups for years, but nothing at this scale or with this direct equity exchange. Anthropic hasn't done anything comparable.

Expect to see competitive responses within weeks. The AI distribution war just moved from "who has the best model" to "who controls the startup ecosystem." And OpenAI just took a commanding lead.

What This Means for You

If you're a startup founder, especially in YC, this is free money. Take it. But architect your application to be model-agnostic from day one. Use abstraction layers, prompt templates, and evaluation frameworks that let you swap models when the economics change. The $2M is a gift; the lock-in is the price.

If you're building at any other stage, watch this space. OpenAI's YC play is just the beginning. Expect similar programs for later-stage startups, enterprise customers, and possibly even individual developers. OpenAI is buying distribution at the cheapest point in the funnel -- before companies have chosen a model at all.


Section 7: ExportAI -- The US Government Subsidizes AI Exports

The EXIM Bank is now backing AI exports with billions in financing. The US government has decided: AI is a strategic export, and it will subsidize it like agriculture and aerospace.


On May 21, 2026, the Export-Import Bank of the United States launched the ExportAI Initiative. The program channels over $100 billion in unused statutory lending capacity toward American-built AI exports. The Commerce Department will help structure full-stack AI packages -- chips, models, data centers -- for foreign buyers. The initiative is aligned with President Trump's Executive Order 14320, which promotes the export of the American AI technology stack.

This is the US government treating AI the way it once treated agriculture and aerospace: as a strategic export that deserves government-backed financing to ensure global market share.

What ExportAI Actually Does

The program has three pillars:

1. Deploy American AI at scale by leveraging Department of Commerce-designated AI exports to fast-track the global deployment of US technologies before competitors can gain ground.

2. Unlock new markets by broadening EXIM's financing reach with new pathways designed for strategic AI transactions.

3. Accelerate deals by replacing administrative bottlenecks with streamlined exporter statements so US companies can move at the speed of the market.

In practice, this means that if you're a foreign buyer looking to build an AI data center, the US government will help finance the deal, as long as you buy American technology. The chips, the models, the infrastructure -- all from US companies, backed by US government financing.

The Geopolitical Context

This didn't happen in a vacuum. China's AI capabilities are growing rapidly, with companies like Alibaba (Qwen), Baidu (Ernie), and DeepSeek building competitive models. The EU is pushing for "digital sovereignty" with its own AI regulations and infrastructure. The UK launched a 500 million pound Sovereign AI fund. South Korea's state-backed AI funds are oversubscribed.

The US is responding with financial firepower. EXIM has over $100 billion in unused lending capacity. By directing that capacity toward AI, the US is making a clear statement: we will subsidize the global adoption of American AI technology to maintain our competitive advantage.

This is industrial policy by another name. The US is not just letting the free market determine which AI technology wins globally. It's actively tilting the field in favor of American companies through subsidized financing, diplomatic support, and export facilitation.

What This Means for You

If you're building AI infrastructure or selling AI technology internationally, this is a major new source of financing. EXIM loans and guarantees can reduce your cost of capital significantly, especially in markets where commercial lenders are hesitant. If you're selling to foreign governments or large enterprises in allied nations, you should be talking to EXIM.

If you're a foreign buyer, this makes American AI technology significantly cheaper than the alternatives. The financing terms will be hard to match from Chinese or European competitors. This is by design.

If you're watching the geopolitical chess game, this is a clear escalation. The AI race is no longer just about who builds the best models. It's about who controls the global infrastructure, and the US just deployed its financial arsenal.


Section 8: OpenAI's Watermark Commitment -- Provenance Gets Real

OpenAI adopted C2PA content credentials and SynthID watermarking for all ChatGPT-generated images. It's the most serious commitment to AI content provenance any major lab has made. Here's what it means and what's missing.


On May 19, 2026, OpenAI announced it's adopting C2PA content credentials and Google's SynthID watermarking for all images generated by ChatGPT. They're also building a public verification tool that lets anyone check whether an image was AI-generated.

This is the most significant commitment to content provenance any major AI lab has made. Not because the technology is new -- C2PA and SynthID have been around -- but because OpenAI is the largest AI image generator in the world, and they're applying these standards across their entire platform by default.

What C2PA and SynthID Actually Do

C2PA (Coalition for Content Provenance and Authenticity) is a standard for embedding metadata in digital content that tracks its origin and any modifications. Think of it like a nutrition label for images: who created it, when, with what tools, and whether it's been edited. C2PA metadata travels with the image, so anyone can verify its provenance.

SynthID is Google's invisible watermarking technology. It embeds a pattern in the image that's imperceptible to humans but detectable by algorithms. Unlike visible watermarks, SynthID doesn't degrade image quality. Unlike metadata, it can't be removed by simply stripping EXIF data. It's designed to survive cropping, resizing, compression, and most common image manipulations.

By combining both, OpenAI is providing two layers of provenance: visible metadata (C2PA) for transparency and invisible watermarks (SynthID) for persistence. If someone strips the metadata, the watermark remains. If someone tries to remove the watermark, the metadata still exists.

What's Missing

Two big gaps remain.

First, this only covers images. OpenAI's text, audio, and video outputs are not included in this commitment. Given that text generation is OpenAI's largest output by volume, and that AI-generated text is arguably more socially disruptive than AI-generated images, this is a significant gap. OpenAI says it's "working on" extending provenance to other modalities, but there's no timeline.

Second, this only covers OpenAI's outputs. The open-source AI ecosystem generates far more AI content than OpenAI does, and most of it has zero provenance tracking. An AI image from Stable Diffusion or a local model won't have C2PA metadata or SynthID watermarks. This creates a two-tier system: OpenAI content is traceable, everyone else's content isn't. Bad actors who want to generate untraceable AI content will simply use tools that don't implement provenance.

Third, SynthID detection isn't open. Google's SynthID watermark can only be detected by Google's tools. OpenAI hasn't said whether the public verification tool will be open-source or proprietary. If detection requires going through OpenAI or Google, it creates a dependency that undermines the trust the system is meant to build.

Why This Still Matters

Despite the gaps, this is a real step forward. OpenAI generates millions of images per day through ChatGPT. Adding provenance to all of them creates a massive corpus of traceable AI content that researchers, platforms, and regulators can use to develop and test detection systems. It also sets a standard that other labs will feel pressure to meet.

The timing is also relevant. As AI-generated content floods the internet -- and it is, rapidly -- the political and social pressure on AI companies to be accountable for what they produce is intensifying. OpenAI's move is partly pragmatic: by getting ahead of regulation, they shape the standards rather than having standards imposed on them. But the effect is the same regardless of motivation.

What This Means for You

If you create content, provenance is about to become a competitive advantage. As AI-generated content becomes ubiquitous, being able to prove that your content is human-created (or AI-assisted with disclosure) will differentiate you. Start using C2PA tools now.

If you build platforms or tools that handle images, plan for a world where provenance metadata is standard. Implement C2PA reading and display. Consider SynthID detection. The infrastructure for content verification is being built right now, and you want to be on the right side of it.

If you're watching the policy landscape, this is a preview of what regulation will look like. The EU AI Act already requires labeling of AI-generated content. California and other states are considering similar laws. OpenAI's voluntary adoption of C2PA and SynthID is both a good-faith effort and a strategic play to set the standard before regulators do.


Section 9: The Connecting Thread -- AI's Proving Week

Seven stories, one shift. Here's how they connect and what they mean together.


Let's step back and look at the week as a whole. These seven stories -- the Erdos proof, Project Glasswing, Google's search overhaul, Qwen3.7's 35-hour run, Karpathy's move, Altman's equity offer, and EXIM's ExportAI initiative -- aren't separate events. They're different facets of the same structural shift.

The Shift: From Promise to Proof

Every previous era of AI hype has been defined by promises. "AI could do X." "AI might be able to Y." "In the future, AI will Z." This week, all the verbs changed. AI proved a mathematical conjecture. AI found 10,000 vulnerabilities. AI ran for 35 hours autonomously. AI replaced the world's biggest website. AI earned equity in startups. AI became a strategic export.

The common thread: AI is no longer asking for permission. It's demonstrating capability in domains that matter -- mathematics, security, infrastructure, engineering, geopolitics -- and forcing institutions to respond.

The Three Forces

You can see three forces at work across all seven stories:

Force 1: Capability acceleration. The models are getting better faster than most people expected. An OpenAI model solved an 80-year math problem. Claude found 10,000+ vulnerabilities. Qwen3.7 ran for 35 hours. The pace isn't slowing down. Each generation of models is building on capabilities that were research projects two years ago.

Force 2: Institutional realignment. Every major institution touched by AI is reorganizing around it this week. Google replaced search with agents. The US government started subsidizing AI exports. Anthropic locked down its most powerful model rather than releasing it. OpenAI committed to content provenance. These aren't incremental changes. They're structural shifts in how organizations operate, compete, and govern.

Force 3: Distribution warfare. Altman's $2M YC offer, Google making Flash the default in search, EXIM backing AI exports, Karpathy joining Anthropic -- these are all plays for distribution, talent, and market position. The AI industry has moved from "build the best model" to "own the ecosystem." The models are the starting point, not the endgame.

What These Forces Mean Together

Capability acceleration means the gap between what AI can do and what most people think it can do is widening. Most organizations are still operating on last year's assumptions. The organizations that update their assumptions fastest will have an advantage.

Institutional realignment means the regulatory, competitive, and strategic landscape is changing under your feet. The rules you wrote six months ago about AI in your organization are probably already outdated. The competitive dynamics you assumed are being rewritten. The government policies you were tracking have been superseded.

Distribution warfare means the AI tools and platforms you choose today are locking you into ecosystems tomorrow. OpenAI's YC play, Google's search integration, Anthropic's safety positioning -- these are all designed to make it expensive to switch. Choose carefully.


Section 10: Your 30-Day Action Plan

What to do now that AI has proven itself.


Here's what you should do in the next 30 days to position yourself for the world these seven stories are creating.

Week 1: Audit Your Assumptions

Re-examine your threat model. If you work in security, assume that AI models can now find vulnerabilities in your code that your current tools can't. Run a Project Glasswing-style audit if you can, or use AI-assisted security tools to supplement your existing scanners.

Check your search strategy. If you depend on Google search traffic for your business, model what happens when AI Mode becomes the default for 50%+ of queries. Start building direct audience relationships now. Email lists, communities, owned platforms. You have a window of 6-12 months before the traffic drop becomes material.

Evaluate your AI stack. Are you locked into a single model provider? If OpenAI gave you $2M in credits, would you take them? What would the switching cost be? Map your dependencies now so you know where you stand.

Week 2: Experiment with Autonomy

Try a long-running agent. Qwen3.7, Claude, and other models can now run autonomously for hours. Pick a real task in your workflow -- code optimi

zation, research synthesis, document drafting -- and let an agent run it end-to-end. The experience of supervising an autonomous agent is different from using a chatbot, and you need to develop that skill.

Test AI-assisted security. If you're a developer, use Claude or another model to audit your own code for vulnerabilities. Compare what the AI finds with what your existing tools find. The gap will tell you how much you're missing.

Prototype with Gemini 3.5 Flash. Google's new model is fast, cheap, and agent-capable. It's the right model for testing agentic workflows without burning through your budget. Build something that uses multiple agents in parallel and see what breaks.

Week 3: Update Your Strategy

Write a content provenance plan. Decide how you'll label, track, and verify the provenance of content you create and content you consume. C2PA tools are available now. SynthID detection is coming. Get ahead of this.

Rethink your distribution. If Google is replacing search with AI answers, where does your audience find you? Invest in channels you control. Optimi

ze for AI readability. Make sure your content is structured in ways that agents can parse and surface.

Review your vendor contracts. If you're an enterprise AI buyer, check your contracts for lock-in clauses. The distribution war means providers will push hard for exclusivity. Resist it. Multi-model, multi-cloud is the winning strategy for the next 12 months.

Week 4: Position for the Proving Era

Build your AI verification muscle. As AI outputs become more capable, verification becomes the highest-leverage human skill. Practice reviewing AI-generated code, checking AI-generated research, and validating AI-generated security findings. The people who can reliably verify AI work will be the most valuable people in any organi

zation.

Watch the institutional moves. Track EXIM's ExportAI program, Anthropic's safety policies, OpenAI's provenance commitments, and Google's search changes. These institutional decisions will shape what's possible, permissible, and profitable more than any model release.

Plan for autonomy. The Qwen3.7 result is a preview. Within 12 months, most major models will be able to run multi-hour autonomous tasks reliably. Start designing workflows that assume an AI agent can handle sustained, complex work. Your job is to define the problem, set the constraints, and verify the output -- not to do the execution.


The Bottom Line

This was the week AI earned its keep. Not in demos. Not in benchmarks. In real, verifiable, consequential work across mathematics, security, infrastructure, and engineering. The question has shifted from "can AI do this?" to "what do we do now that AI can do this?"

The organizations and individuals who answer that question fastest will have a significant advantage over those still operating on last year's assumptions. AI didn't just promise this week. It proved it. And the proving has only just begun.


See you next Monday. -James

The machines didn't ask for permission. They just showed their work.