AI Agents Move From Demos to Durable Work

Today is 2026-06-25, 00:00 Los Angeles time. Here are the global AI events from the last 12-24 hours worth tracking, organized by impact and actionability.

Quick Takeaways

The hottest AI builder signal in the current window is that agents are becoming durable execution systems: OpenAI is publishing Codex adoption data, Vercel is turning coding harnesses into sandboxed infrastructure, Xiaomi is pushing cheap omni-modal agent APIs, and open-source terminal agents are gaining momentum. The second major theme is inference economics: OpenAI is moving down to custom chips while China’s MiMo stack is competing on long context, multimodality, and price. The near-term operator takeaway: design AI products around long-running state, isolated execution, reviewable outputs, model/provider swappability, and cost-aware routing.

1. OpenAI turns Codex usage into a proof point for long-horizon agents

The agent market is moving from “can it code?” to “can it safely absorb hours of cross-functional work?” Teams building dev tools, ops tools, finance automation, legal workflows, or internal copilots should treat persistence, permissions, observability, and handoff design as core product primitives.

Key Details

OpenAI published a new Economic Research paper using Codex telemetry as evidence that agent usage is shifting from short chats to delegated work. The headline numbers are striking: by May 2026, 80.6% of sampled individual Codex users had made at least one request estimated to exceed 30 minutes of human work, 70.2% had crossed one hour, and 25.6% had crossed eight hours.
Inside OpenAI, Codex has reportedly become the primary AI tool across every department, not just engineering. OpenAI says non-developer individual users grew 137x since August 2025, organizational non-developer users grew 189x, and Codex accounts for more than 85% of output tokens for the average OpenAI worker.
Why it is hot now: this is one of the clearest first-party datasets yet showing agent products becoming work systems rather than IDE sidecars. For founders, the practical takeaway is to design around long-running task state, review checkpoints, auditability, and cross-functional workflows—not only chat UX.

Sources

OpenAI - How agents are transforming work (2026-06-25)

2. OpenAI’s Jalapeño chip pushes frontier inference toward vertical integration

This is not just a hardware story. It signals that model labs are competing on the full inference supply chain. AI product teams negotiating multi-year model/platform commitments should expect compute efficiency, latency guarantees, and provider lock-in to become bigger parts of vendor selection.

Key Details

OpenAI and Broadcom unveiled Jalapeño, OpenAI’s first custom “Intelligence Processor” for LLM inference. OpenAI says engineering samples are already running ML workloads in the lab at target frequency and power, including GPT-5.3-Codex-Spark.
The chip was co-developed with Broadcom and Celestica, with Broadcom silicon and networking, Celestica board/rack integration, and planned deployment by the end of 2026 across a multi-generation platform. OpenAI says early testing shows substantially better performance per watt than current state of the art, but a detailed technical report is still pending.
Why it is hot now: inference cost is becoming product strategy. If Jalapeño delivers, OpenAI can tune model architecture, kernels, memory movement, networking, scheduling, and API/product latency as one stack. Builders should watch whether this turns into lower API prices, higher availability, or preferential economics for Codex/agent workloads.

Sources

OpenAI - OpenAI and Broadcom unveil LLM-optimized inference chip (2026-06-24)

3. Vercel shows how to run coding agents safely against untrusted repos

For AI dev-tool founders, sandboxing is becoming table stakes. The winning workflow is not just model quality; it is reproducible execution, isolated file systems, gateway-routed credentials, streamable diagnostics, and reports that humans can review before merging.

Key Details

Vercel published a new builder guide for a sandboxed GitHub issue triage agent using AI SDK 7 HarnessAgent, Vercel Sandbox, and AI Gateway. The pattern runs real coding harnesses—examples include Claude Code and Codex—against untrusted repository code inside an isolated microVM.
The guide shows a concrete Next.js flow: validate a public GitHub issue URL, fetch issue context, launch the selected harness adapter, clone/inspect the repo inside Vercel Sandbox, run a failing command, and stream a structured maintainer report as newline-delimited JSON.
Why it is hot now: this is a practical pattern for moving from “AI writes code on my laptop” to “AI executes untrusted software safely in production workflows.” It also points to a near-term abstraction layer where Claude Code, Codex, OpenCode, Pi, and future coding agents become swappable execution backends.

Sources

Vercel - Investigate GitHub issues with HarnessAgent and Sandbox (2026-06-25)

If these prices and throughput claims hold in real workloads, startups outside China get another reference point for how cheap agentic multimodal inference may become. Even if availability is region-limited, it pressures global labs on price, context length, and API compatibility.

Key Details

Xiaomi’s MiMo page now shows the V2.5 series available, with MiMo-V2-Pro/Omni/Flash auto-routed to V2.5 pricing and the V2 series scheduled for full deprecation on June 30. The page also highlights MiMo Claw as an official agentic platform launch with native OpenClaw integration and WPS ecosystem support.
The technical/economic signal is aggressive: MiMo-V2.5 advertises native omni-modal understanding across image, video, audio, and text, 1M context, and agent execution for browsing, reasoning, and acting. Listed API pricing for MiMo-V2.5 is
```
 $0.14 per million cache-miss input tokens and$ 
```
0.28 per million output tokens; MiMo-V2.5-Pro lists 1T total parameters, 42B active, 1M context, and
```
 $0.435 /$ 
```
0.87 per million input/output tokens.
Why it is hot now: this is a strong China/Asia builder-economics signal. The model platform is pushing OpenAI- and Anthropic-compatible APIs, agent IDE integrations, long context, multimodality, and very low token prices in one package.

Sources

Xiaomi MiMo - Xiaomi MiMo-V2.5 Series — Now Available / MiMo Claw Official Launch (2026-06-25)

5. oh-my-pi shows open-source coding agents are moving down-stack

Builders should watch terminal agents because they become the integration layer between models, local repos, shells, browsers, LSPs, and CI. The practical opportunity is not another chat pane; it is reliable tool orchestration and patch application in environments developers already trust.

Key Details

oh-my-pi is currently a high-momentum open-source terminal coding agent with 14.6k GitHub stars and a latest release listed as v16.1.19 on June 25, 2026. The project describes itself as an AI coding agent for the terminal with hash-anchored edits, optimized tool harnesses, LSP integration, Python, browser support, subagents, and more.
The README positions it as a coding agent with the IDE wired in, installable via shell script, Homebrew, Bun, Windows PowerShell, and embeddable as a Node/TypeScript SDK. Its package structure includes multi-provider LLM support, agent runtime, terminal UI, native grep/shell/image/text bindings, local memory, context compression, and swarm orchestration.
Why it is hot now: the open-source agent stack is consolidating around terminal-first workflows that combine fast editing, richer local tools, multiple model providers, memory, and subagents. This is the same direction commercial IDE agents are moving, but with hackable infrastructure.

Sources

GitHub - can1357/oh-my-pi (2026-06-25)

6. OpenAI Daybreak reframes AI security around landing patches, not just finding bugs

Security teams should evaluate AI tools by closed-loop remediation quality: reachability analysis, evidence, reproducible tests, human approval, SARIF/CodeQL export, and patch review. Pure vulnerability generation without maintainer capacity will create more noise than safety.

Key Details

OpenAI’s Daybreak expansion is a few days old, but it is still gaining builder attention because it ties frontier cyber models directly to patch generation workflows. OpenAI says the updated Codex Security plugin supports defensive security scans, validation evidence, attack-path tracing, threat modeling, and codebase-specific patch generation.
The full GPT-5.5-Cyber limited release is aimed at trusted defenders. OpenAI reports 85.6% on CyberGym versus 81.8% for GPT-5.5, 39.5% versus 25.95% on ExploitGym, and 69.8% versus 63.1% on SEC-bench Pro. Patch the Planet, founded with Trail of Bits and in collaboration with HackerOne, Calif, researchers, and maintainers, has more than 30 open-source projects committed, including cURL, Go, Python, Sigstore, and pyca/cryptography.
Why it is hot now: the useful lesson is not “AI finds more bugs.” It is that the scarce workflow is validating, deduplicating, patching, testing, and landing fixes without overwhelming maintainers. This is immediately relevant to any team adding AI to AppSec or dependency-maintenance pipelines.

Sources

OpenAI - Daybreak: Tools for securing every organization in the world (2026-06-22)

Signals to Watch Next

OpenAI’s promised Jalapeño technical report: final perf/watt, supported workloads, and whether savings reach API customers.
Whether Vercel’s HarnessAgent abstraction gets first-class adapters beyond Claude Code and Codex, especially OpenCode/Pi/local agents.
Real-world MiMo V2.5 availability, latency, rate limits, and compatibility outside China; benchmark claims need independent confirmation.
Open-source agent reliability metrics: patch success rate, rollback behavior, test reproduction, and safe execution defaults.
Whether Daybreak/Patch the Planet produces public merged patches at scale without overwhelming maintainers.

This post was generated automatically from web search results. Key sources should be spot-checked before reuse.