The Agentic Engineer
• LangChain open-sources Deep Agents. MIT-licensed coding agent built on LangGraph. Works with any model. Planning, filesystem, shell, sub-agents out of the box. pip install deepagents and you have a Claude Code alternative for $0.
• A GitHub issue title compromised 4,000 developer machines. The first documented "AI installs AI" supply chain attack. Prompt injection in Cline's triage bot led to credential theft and malicious npm publishes.
• Karpathy drops autoresearch. One GPU, one file, one metric. ~100 ML experiments while you sleep. "You're not editing Python anymore. You're programming program.md."
LangChain Just Open-Sourced a Claude Code Replacement
Claude Code costs $200/month. Deep Agents costs nothing. LangChain released it this week under the MIT license. It's built on LangGraph and works with any model that supports tool calling. The README says it was "inspired by Claude Code" with the goal of being "even more general purpose."
pip install deepagents. Three lines of Python and you have a working coding agent:
from deepagents import create_deep_agent
agent = create_deep_agent()
result = agent.invoke({"messages": [{"role": "user", "content": "Refactor the auth module"}]})
What ships out of the box: a planning tool (write_todos) for task breakdown, full filesystem access (read_file, write_file, edit_file, ls, glob, grep), and shell execution with sandboxing. You also get sub-agent spawning for delegating work with isolated context windows. Auto-summarization kicks in when conversations get long. Large outputs get saved to files automatically.
The provider-agnostic angle is the real story. Claude Code locks you into Anthropic's models and pricing. Deep Agents works with GPT-4o, Claude, Gemini, Llama, or whatever you're running locally. Swap models in one line:
from langchain.chat_models import init_chat_model
agent = create_deep_agent(
model=init_chat_model("openai:gpt-4o"),
tools=[my_custom_tool],
system_prompt="You are a research assistant.",
)
MCP support comes via langchain-mcp-adapters. There's also a CLI with web search, remote sandboxes, persistent memory, and human-in-the-loop approval.
Because create_deep_agent returns a compiled LangGraph graph, you get streaming, Studio integration, checkpointers, and persistence for free. If you're already using LangChain, this slots in without rewiring anything.
The security model is honest about its tradeoffs. Deep Agents follows a "trust the LLM" approach. The agent can do anything its tools allow. You enforce boundaries at the tool and sandbox level, not by expecting the model to self-police. That's the right call. Pretending the model will follow safety instructions under adversarial conditions is how you get Clinejection (see Hot Take below).
Look, LangChain has a history of shipping abstractions that add complexity without adding value. This is different. Deep Agents is opinionated and ready to run. The batteries-included approach, planning plus filesystem plus shell plus sub-agents, covers 90% of what coding agents actually need. The MIT license means you can fork it, embed it, sell it. No AGPL gotchas.
The coding agent market just got commoditized. Anthropic charges $200/month for Claude Code. OpenAI's Codex is in limited preview. LangChain made the whole thing free and model-agnostic. If you're paying for a proprietary coding agent, you should at least benchmark Deep Agents against it this week.
Source: GitHub · MIT License
GPT-5.4 Ships Native Computer Use
GPT-5.4 is the first frontier model with computer use baked in. The model sees your screen, moves your mouse, types on your keyboard. OSWorld score: 75%, beating the human baseline of 72.4%. Combined with a 1M token context window and dynamic tool discovery, this collapses the "build custom integrations vs. bolt on a computer-use model" decision into one inference pass. Sandboxing is mandatory. 25% failure rate on standardized tasks means production will be worse.
Karpathy's autoresearch: Let Agents Run Your ML Experiments Overnight
Andrej Karpathy released a deliberately minimal repo for autonomous ML research. One GPU, one file the agent edits (train.py), one metric (val_bpb), 5-minute experiment budget. The agent runs ~100 experiments while you sleep. The key insight: "You're not editing Python anymore. You're programming program.md." This is the clearest demonstration yet that agents-as-researchers is a real workflow, not a thought experiment.
Claude Found 22 Firefox Zero-Days in Two Weeks
Claude Opus 4.6 found 22 vulnerabilities in Firefox during a two-week collaboration with Mozilla. 14 were high-severity. The first Use After Free took 20 minutes to find. Mozilla shipped fixes in Firefox 148. Also demonstrated crude exploit generation in 2 cases. AI-powered security research just graduated from "interesting paper" to "shipping patches."
Anthropic: No AI Unemployment Yet, But Junior Hiring Is Slowing
Anthropic introduces "observed exposure," combining theoretical LLM capability with actual usage data. No systematic unemployment increase. But hiring of younger workers has slowed in AI-exposed occupations. 561 HN comments means this hit a nerve. The nuance matters: the threat isn't mass layoffs. It's the entry-level pipeline quietly narrowing.
Alibaba OpenSandbox: The Missing Infrastructure for Agent Isolation
General-purpose sandbox platform with multi-language SDKs, Docker/K8s runtimes, and gVisor/Kata/Firecracker isolation options. Pre-built examples for Claude Code, Codex CLI, Gemini CLI, and OpenClaw. This is the infrastructure layer that agent builders have been duct-taping together from Docker configs and prayer. 3,900 new stars in a week says the demand was there.
SWE-CI: The First Benchmark That Tests Whether Agents Can Actually Maintain Code
Core insight: Every coding agent benchmark until now tests one-shot bug fixes. SWE-bench asks: "Can you fix this issue?" SWE-CI asks the harder question: "Can you maintain this codebase over 233 days and 71 commits of real evolution?"
What they built: 100 tasks pulled from real repositories, each spanning months of continuous integration history. Agents don't just fix one bug. They resolve tasks through dozens of rounds of analysis and coding iterations, dealing with shifting requirements, accumulating technical debt, and breaking changes from other commits.
Why it matters for builders: If you're deploying coding agents in production, you know the gap. "Agent fixes a bug" is not the same as "agent maintains a feature branch for three months." SWE-CI is the first benchmark that measures the second thing. Current agents struggle with it. Tracking 71 commits of evolution history exposes weaknesses that one-shot benchmarks hide.
Practical takeaway: The shift from "functional correctness" to "maintainability" as the evaluation target is the right move. If your team is evaluating coding agents, stop benchmarking on isolated fixes. Test them on your actual CI pipeline over a week. SWE-CI gives you the methodology to do that rigorously.
Knowledge gap: The paper doesn't test multi-agent setups where different agents handle different parts of the CI loop. That's where most production deployments are heading.
Time saved: 6 min read vs 38 min paper. 6.3x compression.
Shannon: The AI Pentester That Only Reports What It Can Actually Exploit
Most security scanners give you a list of theoretical vulnerabilities. Shannon gives you a list of things it actually broke into. That's the difference.
Shannon is an open-source (AGPL-3.0) autonomous pentester that reads your source code, identifies attack vectors, and executes real exploits with proof-of-concept. SQLi, auth bypass, SSRF, XSS. If it can't exploit it, it doesn't report it. 96.15% on the XBOW benchmark.
Install:
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
pip install -e .
# Point it at your app
shannon scan --target http://localhost:3000 \
--source ./my-app \
--output report.json
What it finds:
# Example output (redacted)
{
"vulnerability": "SQL Injection",
"endpoint": "/api/users?id=",
"severity": "critical",
"exploit_poc": "GET /api/users?id=1' OR '1'='1",
"verified": true,
"evidence": "Returned 847 rows (expected 1)"
}
The white-box approach is what makes this interesting. Shannon reads your actual source code to find attack vectors, then confirms them with real requests. It's not fuzzing blindly. It understands your code paths.
When to use it: Pre-deploy security checks, CI pipeline integration, or periodic audits. Pairs well with the Clinejection story this week. Agents are creating new attack surfaces. Shannon is an agent that finds them.
The catch: AGPL-3.0 means you can't embed it in proprietary SaaS without releasing your source. For internal use and CI pipelines, that's fine. 32.8k stars and +6,900 this week. The security community noticed.
Verdict: The "only reports verified exploits" philosophy is the right call. False positives are the reason most teams ignore their security scanners. Shannon fixes that by only crying wolf when there's an actual wolf.
Framework Star Tracker
Weekly star tracker, March 9, 2026. Deltas vs. Issue #2 (March 2).
| Framework | Stars | Weekly Δ |
|---|---|---|
| OpenClaw | 286,434 | +38,789 |
| n8n | 178,219 | +991 |
| Dify | 131,713 | +777 |
| LangChain | 128,731 | +788 |
| AutoGen | 55,350 | +278 |
| Flowise | 50,550 | +1,047 |
| LlamaIndex | 47,506 | +192 |
| CrewAI | 45,571 | +581 |
| Semantic Kernel | 27,396 | +50 |
| LangGraph | 25,948 | +479 |
| Haystack | 24,437 | +62 |
| Vercel AI SDK | 22,442 | +183 |
| Mastra | 21,820 | +178 |
| OpenAI Agents SDK | 19,451 | +181 |
| Strands SDK (AWS) | 5,285 | +40 |
Notable moves: OpenClaw added 38,789 stars in one week. That's not a typo. The gap between #1 and #2 widened from 70K to 108K. Flowise quietly had its best week ever (+1,047), likely riding the "visual agent builder" wave as more non-developers try to build agents. n8n nearly cracked 1K weekly adds. The middle of the pack (Semantic Kernel, Haystack, Strands) is stalling. Microsoft's Semantic Kernel gained just 50 stars, the lowest on the board. The enterprise frameworks are losing mindshare to the builder-friendly ones.
The Clinejection attack should scare you more than it does. A prompt injection in a GitHub issue title tricked an AI triage bot into executing code. That led to cache poisoning, credential theft, and a malicious npm publish. The package silently installed a second AI agent on 4,000 developer machines. The attack chain is elegant and terrifying: AI reads untrusted input, AI executes code, AI publishes package, package installs more AI. We gave agents write access to CI/CD pipelines because it was convenient. Nobody asked what happens when the agent's input is adversarial. Now we know. 4,000 developers learned the hard way: "AI in the loop" means "new attack surface in the loop." Bot with push access to npm? That's a supply chain weapon. The fix isn't "better prompt engineering." The fix is: agents don't get write access to package registries. Period. Treat agent permissions like you treat IAM roles. Least privilege. No exceptions.
What should I dig into next week?
| 🔥 Agent supply chain security deep dive |
| 🤖 Computer-use agents compared |
| 🧪 Building your own autoresearch setup |
CertPrep
17,000+ practice questions across 49 certification exams — Azure, GCP, CISSP, CompTIA, Cisco, Kubernetes & more. Timed tests, detailed explanations for every option, progress tracking. Free tier for every exam. One-time purchase, no subscriptions.
Download FreeWant to sponsor this newsletter? Get in touch
Like what you read?
Forward this to a friend who's building with agents.
Subscribe to The Agentic Engineer