AI-narrated by Amazon Polly • The Agentic Engineer

The Agentic Engineer

I read the repos so you don't have to.
Issue #14 | May 27, 2026

🔒 Anthropic's Glasswing partners found 10,000+ critical vulns in one month using Mythos Preview. Cloudflare alone: 2,000 bugs, 400 critical.

🛠️ CodeGraph gives coding agents a pre-indexed knowledge graph. 35% cheaper, 71% fewer tool calls. Tool of the Week.

📄 New paper quantifies why coding agents collapse in Django/FastAPI but succeed in Flask. Data-layer defects are the #1 root cause.

Glasswing Update: Mythos Preview Found 10,000+ Critical Vulns in One Month

Source: Anthropic Blog | May 22, 2026

In Issue #12, we covered Mythos finding a single curl vulnerability. That was the proof of concept. Now we have the production numbers. They're staggering.

Anthropic's 50 Glasswing partners deployed Mythos Preview against critical infrastructure software for 30 days. The result: 10,000+ high and critical-severity vulnerabilities discovered. Not theoretical. Not in toy codebases. In the software that runs the internet.

The standout numbers. Cloudflare found 2,000 bugs, 400 of them critical, with better-than-human false positive rates. Mozilla found 271 vulnerabilities in Firefox 150. That's 10x more than Opus 4.6 found in Firefox 148 just weeks earlier. The UK's AI Safety Institute confirms Mythos is the first model to solve both their cyber ranges end-to-end.

What changed between Opus 4.6 and Mythos. Anthropic hasn't published the full technical details, but the 10x improvement on Firefox suggests this isn't just a bigger model. The architecture likely includes specialized vulnerability reasoning that Opus lacks. When a model goes from "finds some bugs" to "finds all the bugs," something fundamental shifted in how it reasons about code paths.

The bottleneck has moved. Finding vulnerabilities is no longer the hard part. Patching them is. 10,000 critical vulns in 30 days means the remediation pipeline is now the constraint. Every security team using Mythos will need to triage faster than the model can discover. That's a new kind of problem.

Why this matters for builders. If you're shipping software that handles sensitive data, an AI will find your vulnerabilities. The only variable is whether you find them first. Mythos is currently restricted to Glasswing partners. But the capability exists. Assume every critical codebase will be scanned by something this capable within 12 months.

The asymmetry is real. Defenders now have a tool that finds 10,000 vulns in a month. Attackers will have equivalent tools soon. The window between "AI finds the bug" and "AI exploits the bug" is closing. Ship your patches faster.

Hacker News (564 pts) / GitHub Trending

DeepSeek Reasonix: The Cache-First Coding Agent ($12 vs $61)

Terminal coding agent engineered entirely around DeepSeek's prefix-cache stability. One user hit 435M input tokens in a single day with 99.82% cache hit rate, paying ~$12 instead of ~$61. The architecture keeps tokens cacheable across long sessions by design. DeepSeek-only because the entire approach depends on their byte-stable prefix-cache mechanic. If you're running DeepSeek for coding, this is the cheapest way to do it.

AWS Blog (May 18, 2026)

AWS Transform Agents Go Multi-IDE: Kiro, Claude Code, Cursor, Codex

AWS Transform hit its 1-year anniversary with a big expansion. The migration/modernization agents now work in Kiro, Claude Code, Cursor, and Codex via MCP server and agent plugins. New Agent Builder Toolkit Kiro power lets you build custom transformation agents. The numbers: 4.5 billion lines of code transformed, 1.6 million hours saved. Plus new what-if scenarios and TCO assessment for building migration business cases autonomously.

Kiro (May 2026)

Kiro Breaks Free: Web Preview, Rewindable Turns, Multi-Repo Agents

Kiro Web (Preview) launched at app.kiro.dev with collaborative and autonomous modes, multi-repo coordination in a single session, and agents that open PRs and respond to review comments. Rewindable conversations let you branch off any earlier turn. Workspace initialization is 88% faster. The multi-repo coordination is the standout: one agent session working across your frontend, backend, and infra repos simultaneously.

Show HN (685 pts) / GitHub

Forge: Guardrails That Take an 8B Model from Single Digits to 84%

Python reliability layer for self-hosted LLM tool-calling. You give it tools, the model calls them freely, and Forge applies rescue parsing, retry nudges, and response validation to keep things on track. An 8B local model goes from single-digit success to 84% on their 26-scenario eval suite. Sonnet 4.6 goes from 85% to 98%. Works as a drop-in proxy server: point OpenCode, aider, or Claude Code at it and the client thinks it is talking to a smarter model. Not an orchestrator, not a coding harness. Just the reliability layer that makes tool-calling actually work on small models.

GitHub Trending (40,223 stars, +4,759/week)

CLI-Anything: Making ALL Software Agent-Native (40K Stars)

HKU project that wraps any software in a CLI harness so AI agents can use it. Includes CLI-Hub, a community registry for agent-ready CLIs. Recent additions: Rekordbox, Calibre, 3MF, MiniMax. Works with Pi, OpenClaw, Cursor, Claude Code. The thesis: tomorrow's software users will be agents, not humans. At 40K stars, the community agrees.

Constraint Decay: Why Coding Agents Fail at Real Backend Code

Source: ArXiv cs.SE | May 2026 | 241 HN points

Core insight: Coding agents lose 30 points on assertion pass rates when structural constraints (architecture patterns, ORMs, database schemas) accumulate. They succeed in minimal frameworks and collapse in convention-heavy ones.

The setup: 80 greenfield tasks and 20 feature-addition tasks across 8 frameworks: Flask, FastAPI, Django, Express, NestJS, Spring Boot, Rails, and Laravel. Each task tested with increasing structural constraints (none, light, moderate, heavy). The researchers measured assertion pass rate, not just "does it run."

What breaks: Data-layer defects are the #1 root cause of failure. Wrong ORM queries, violated database constraints, incorrect migration patterns. Agents understand the API layer fine. They understand business logic fine. But the moment they need to respect an existing data model with foreign keys, indexes, and migration history, they fall apart. Django's ORM conventions caused the steepest decline.

The framework sensitivity: Flask (minimal constraints) saw only a 7-point drop from zero to heavy constraints. Django saw a 38-point drop. FastAPI with SQLAlchemy: 31-point drop. The more opinions a framework has, the worse agents perform. Convention-over-configuration is an anti-pattern for AI-assisted development.

Why builders should care: If you're choosing a stack for agent-assisted development, this data says pick minimal frameworks. Flask over Django. Express over NestJS. Or invest heavily in context engineering: feed your agent the full schema, migration history, and architectural decisions before it writes a single line.

Time saved: 8 min read vs 52 min paper. 6.5x compression.

CodeGraph

github.com/colbymchenry/codegraph | 23,809 stars (+18,136 this week)

Pre-indexed knowledge graph that gives coding agents a map of your codebase before they start reading files. Symbol relationships, call graphs, dependency chains. Your agent stops wandering through 50 files and goes straight to what matters.

The benchmarks. Tested across 7 codebases in 7 languages. Average results: 35% cheaper, 57% fewer tokens, 46% faster, 71% fewer tool calls. The Tokio benchmark (Rust async runtime, 200K+ lines) hit 82% cost reduction. These aren't synthetic tests. They're real repos with real complexity.

Install:

npm install -g codegraph cd your-project codegraph index codegraph serve

Works with: Claude Code, Codex, Cursor, OpenCode, Hermes Agent. Exposes an MCP server that any compatible agent can query. Zero config after indexing.

Why it matters. The #1 cost driver for coding agents is reading irrelevant files. CodeGraph solves this by giving agents a structural map upfront. Instead of grep-and-read loops, the agent queries the graph for "what calls this function" or "what depends on this module" and gets precise answers. The 71% reduction in tool calls means your agent finishes faster and cheaper on every task.

Weekly star tracker, May 27, 2026. Deltas vs. Issue #13 (May 20, 2026).

FrameworkStarsWeekly Δ
OpenClaw374,509+1,675
n8n189,613+1,124
Dify142,561+804
LangChain137,587+585
AutoGen58,375+235
Flowise53,056+153
CrewAI52,143+508
LlamaIndex49,645+165
LangGraph32,885+569
Semantic Kernel27,974+45
OpenAI Agents SDK26,629+203
Haystack25,368+101
Vercel AI SDK24,457+154
Mastra24,281+299
MS Agent Framework10,714+199
Strands Agents5,929+52

Notable moves: n8n crosses 189K and posts its strongest week since March (+1,124). CrewAI (+508) and LangGraph (+569) continue their quiet climb while the parent LangChain repo slows. Mastra (+299) is the fastest-growing sub-30K framework for the third consecutive week. Semantic Kernel (+45) remains nearly flat, suggesting Microsoft's developer attention is shifting to Agent Framework (+199).

AI-Generated Issues Are Poisoning Open Source. And Nobody's Talking About It.

Armin Ronacher (Pi creator) published a piece this week on "slop issues." Bug reports that are 5% human observation, 95% AI-generated confident-but-wrong diagnosis. The problem isn't that they're AI-written. The problem is they're plausible. Maintainers waste hours chasing phantom bugs. Worse: coding agents that read these issues inherit the wrong diagnosis and propagate it into their fixes. We're building a feedback loop where agents generate bad issues, other agents read those issues, and both produce wrong code. Pi now has a /is command that tells agents to independently verify rather than trust issue text. Every project with agent contributors needs something similar. Trust nothing. Verify everything. The era of "the issue says X so X must be true" is over.

CertPrep

CertPrep

32,000+ practice questions for 106 certification exams from 22 vendors. AWS, Azure, GCP, CISSP, CCNA, Security+, CompTIA A+, Fortinet, Juniper, Kubernetes, Salesforce, SAP, Databricks and more. Timed practice tests, verified answers with detailed explanations for every option, bookmarks, progress tracking. Free tier for every exam. One-time purchase per exam, no subscriptions.

Download Free

Want to sponsor this newsletter? Get in touch

Like what you read?

Forward this to a friend who's building with agents.

Subscribe to The Agentic Engineer
💬 Join the discussion