The Agentic Engineer

TL;DR

• Diffusion models are coming for LLMs. Inception Labs dropped Mercury 2, a reasoning model that generates tokens in parallel. 1,000+ tokens/sec on Blackwell GPUs. This changes the math on agentic loops.

• The "agentic IDE" war is heating up. Emdash (YC W26) runs 21 coding agents in parallel. Cloudflare shipped a full agent hosting SDK. Pi Mono is trending hard.

• Hugging Face is standardizing agent skills. Their new skills repo creates a universal format that works across Claude Code, Codex, Gemini CLI, and Cursor. If this catches on, it's the npm of agent capabilities.

🔥 The Big One

Mercury 2: The End of One-Token-at-a-Time

Look, I've been skeptical of "fastest LLM ever" claims. We get one every other week. But Mercury 2 from Inception Labs is doing something genuinely different, and it matters.

Every LLM you use today, GPT, Claude, Gemini, Llama, generates text the same way: one token, then the next, then the next. It's sequential. It's a bottleneck. And when you're running agentic loops that chain dozens of inference calls together, that bottleneck compounds into something painful.

Mercury 2 doesn't do that. It uses diffusion-based decoding, the same family of techniques that made image generation fast. Instead of writing left-to-right like a typewriter, it refines an entire response in parallel, converging over a few steps. Think less "typing" and more "editing a full draft at once."

The numbers: 1,009 tokens/sec on NVIDIA Blackwell GPUs. At $0.25/1M input and $0.75/1M output tokens. Skyvern's CTO said it's "at least twice as fast as GPT-5.2." That's not a marginal improvement. That's a different category.

Here's why this matters for agents specifically. When your agent workflow chains 30 inference calls per task, cutting latency per call from 2 seconds to 0.4 seconds isn't just "faster." It means you can afford more reasoning steps. More tool calls. More retries. The quality ceiling goes up because the cost of thinking goes down.

Zed's co-founder said suggestions "land fast enough to feel like part of your own thinking." That's the bar. Not "fast for an LLM," fast enough that the human doesn't notice the machine.

The catch? Quality is described as "competitive with leading speed-optimized models," not frontier models. This isn't replacing Claude or GPT for complex reasoning. It's carving out a new lane: the model you use for the 80% of agent calls that need to be good-enough and instant.

I think this is the real unlock for production agents. Not smarter models, faster ones. The bottleneck was never intelligence. It was latency compounding across loops. Mercury 2 is the first model that feels like it was built for agents, not adapted for them.

Read more: Inception Labs blog

⚡ Quick Hits

Y Combinator W26 · Hacker News · 162 pts

Emdash: The Agentic IDE

A desktop app that lets you run multiple coding agents in parallel, each isolated in its own git worktree. Supports 21 CLI agents including Claude Code, Qwen Code, Amp, and Codex. Pass Linear/GitHub/Jira tickets directly to agents, review diffs, and merge, all from one UI. Works locally or over SSH. This is what "agentic development" actually looks like: not one agent doing everything, but many doing focused tasks simultaneously.

Cloudflare · GitHub · 4,210 stars

Cloudflare Agents SDK

Cloudflare quietly shipped one of the most complete agent hosting frameworks I've seen. Each agent is a Durable Object with persistent state, WebSocket support, scheduling, MCP server/client, SQL storage, and React hooks for frontends. Agents hibernate when idle and wake on demand. The killer feature: run millions of them and pay nothing when they're inactive.

Hugging Face · GitHub · 6,018 stars

Hugging Face Skills

HF released a standardized skill format for coding agents. Each skill is a self-contained folder with a SKILL.md that works across Claude Code, OpenAI Codex, Gemini CLI, and Cursor. Think portable agent capabilities: install a "gradio" skill and your agent knows how to build Gradio apps regardless of which LLM is driving. Could become the package manager for agent knowledge.

GitHub · 16,300 stars (+3,000 this week)

Pi Mono: Agent Toolkit Blowing Up

Mario Zechner's monorepo for building AI agents hit 16.3k stars with 3,000 new ones this week. Full stack: unified LLM API, agent runtime with tool calling, coding agent CLI, terminal UI, web UI components, and vLLM pod management. The "shitty coding agent" branding is peak developer marketing.

📄 Paper Breakdown

Diffusion Language Models: A Survey of Methods and Applications

What it is: A comprehensive survey of diffusion-based approaches to language generation, the same technique powering Mercury 2. Covers discrete diffusion, continuous diffusion over embeddings, and hybrid approaches.

Why it matters: Autoregressive generation (one token at a time) has been the only game in town for LLMs. This paper maps out the entire landscape of alternatives. If Mercury 2's approach scales, this survey becomes the roadmap for the next generation of language models.

Key insight: Diffusion models can generate all tokens simultaneously and then iteratively refine them. This parallelism is what gives Mercury 2 its speed advantage. The tradeoff is quality per step vs. number of steps. More refinement steps = better quality but slower. The sweet spot for agent use cases is fewer steps (faster) since most agent calls don't need frontier-level reasoning.

For practitioners: If you're building latency-sensitive agent pipelines, understand this tradeoff. Diffusion LLMs let you dial the quality-speed knob in ways autoregressive models can't. Expect more models in this family by mid-2026.

🛠️ Tool of the Week

Moonshine: On-Device STT That Actually Works

Open-weight speech-to-text models that outperform Whisper Large v3 on the Open ASR Leaderboard. Models range from 26MB (IoT/edge) to full-size. Runs on everything: Python, iOS, Android, Raspberry Pi, even wearables. All on-device, no API keys.

Install it:

pip install moonshine-voice

Two lines to transcribe audio:

from moonshine_voice import transcribe
result = transcribe("audio.wav", language="en")
print(result["text"])

What impressed me: the smallest model is 26MB and still usable. The larger models genuinely compete with Whisper Large v3 on accuracy. Latency is noticeably lower because it's optimized for streaming, it processes audio while you're still talking.

The multi-platform story is real too. Same library works on Python, iOS, Android, and C++ for embedded. If you're building voice into an agent pipeline, this replaces the "send audio to an API" step entirely.

Honest take: documentation could be better for advanced use cases, and non-English support is newer and less battle-tested. But for English STT on edge devices? This is the new default.

📊 Agent Index

Weekly star tracker — February 25, 2026.

Framework	Stars	Notes
OpenClaw	227,487	The lobster reigns.
n8n	176,278	Workflow automation king.
Dify	130,300	Full-stack LLM app platform.
LangChain	127,389	The OG, still growing.
AutoGen	54,822	MS nudging users to Agent Framework.
Flowise	49,334	Visual agent builder.
LlamaIndex	47,183	Pivoting to agentic OCR.
CrewAI	44,586	Role-playing orchestration.
Semantic Kernel	27,307	Microsoft's enterprise play.
Haystack	24,310	Context-engineered pipelines.

The top of the chart is pulling away from the pack. Microsoft nudging AutoGen to Agent Framework is the move to watch.

💀 Hot Take

"Agentic" stopped being a feature and became an infrastructure category. Cloudflare is building agent hosting primitives. Emdash treats agents like IDE tabs. HF is packaging agent knowledge like npm modules. Mercury 2 optimized specifically for agent loop latency. We crossed a line. Six months ago, "agent" meant a chatbot with tool calls. Now it means persistent processes with state, scheduling, and lifecycle management. The companies building agent infrastructure, not agent demos, are the ones that'll matter in 12 months. The demo era is over. The plumbing era just started.

📣 Sponsor

Mass Time App

The world's largest Catholic church finder. 280,000+ churches across 131 countries. Daily readings with AI reflections, 60+ prayers, and a home screen widget. 100% free, 100% private. Built by Waltsoft Inc.