AI-narrated by Amazon Polly • The Agentic Engineer

The Agentic Engineer

I read the repos so you don't have to.
Issue #11 | May 6, 2026

🔥 OpenAI models, Codex, and Managed Agents land on Amazon Bedrock. Model exclusivity is officially dead.

🛠️ AgentCore Optimization closes the observe-evaluate-improve loop for production agents. Your agents don't just run. They get better.

📄 T-MAP red-teams frontier agents at 57.8% success rate. Multi-step tool-use attacks are the new threat model.

Last week you voted: 75% said most of your AI-generated code survives to production. Builders are keeping what agents write.

AWS + OpenAI: GPT-5.5, Codex, and Managed Agents Come to Bedrock

Source: AWS Blog | What's Next with AWS Event, Apr 28

The model exclusivity era is over. At the "What's Next with AWS" event on April 28, AWS and OpenAI announced a partnership that puts OpenAI's frontier models directly inside Amazon Bedrock. Three announcements, each bigger than the last.

1. OpenAI models on Bedrock (limited preview). GPT-5.5 and GPT-5.4 are now accessible through Bedrock APIs. Same security controls, same governance, same billing. Enterprises that spent years building on Bedrock can now run OpenAI's best models without a second vendor relationship.

2. Codex on Bedrock. OpenAI's coding agent is available via Bedrock API, CLI, desktop app, and VS Code extension. Authentication uses AWS credentials. Usage counts toward existing cloud commitments. Your team doesn't need a separate OpenAI contract to run Codex at scale.

3. Bedrock Managed Agents powered by OpenAI. This is the new category. Production-ready agents built with the OpenAI harness, running on AWS infrastructure. Faster execution, sharper reasoning, reliable long-running tasks. AWS handles the infrastructure. OpenAI provides the brain.

Why this matters for builders: Six months ago, running OpenAI meant running OpenAI. Separate API keys, separate billing, separate governance. Now you can run GPT-5.5 with the same IAM roles, VPC configs, and CloudTrail logs you use for everything else. The switching cost between model providers just dropped to near zero.

The strategic read is clear. OpenAI needs distribution. AWS needs frontier models. Neither can win alone. OpenAI gets access to every enterprise already on AWS (which is most of them). AWS gets to say "we have every model" without building them.

For the agent builders reading this: Codex on Bedrock is the practical win. If your org already has AWS credentials and cloud commitments, you just got a coding agent for free. No procurement cycle. No new vendor approval. Just enable it.

The moat isn't the model anymore. It's the infrastructure the model runs on. AWS just proved it.

GitHub Trending (467 HN pts)

DeepClaude: Claude Code's Agent Loop at 17x Less Cost

A shell script that swaps Claude Code's backend to DeepSeek V4 Pro. $0.87/M output tokens vs $15/M. You keep the full agent loop: file editing, bash, git, subagent spawning. Supports OpenRouter and Fireworks backends too. DeepSeek's auto context caching makes subsequent turns absurdly cheap. The question every builder is asking: do you need Anthropic prices for the harness, or just for the model?

Cloudflare Blog

Cloudflare + Stripe: Agents Can Now Create Accounts, Buy Domains, and Deploy

Agents can autonomously provision a Cloudflare account, start a paid subscription via Stripe, register a domain, and get an API token to deploy code. Zero human steps from idea to production. Built on the new Stripe Projects protocol. This is agents-as-customers, not agents-as-tools. The first major cloud provider treating AI agents as first-class buyers.

GitHub Trending (#1, +34,848 stars this week)

mattpocock/skills: 57K Stars for Practical Coding Agent Skills

Matt Pocock's collection of composable agent skills for Claude Code, Codex, and other coding agents. /grill-me forces alignment before coding starts. /tdd adds red-green-refactor loops. Shared language docs reduce token waste. 34K stars in a single week. The AGENTS.md pattern is maturing into something genuinely useful for engineering teams.

Google Cloud Blog

Gemini Enterprise Agent Platform: Google Renames Vertex AI

Google rebrands Vertex AI as Gemini Enterprise Agent Platform. Adds Agent Identity, Agent Registry, Agent Gateway for governance. Agent Runtime supports long-running agents with Memory Bank. All future Vertex AI features ship exclusively through Agent Platform. The rename tells you everything: "AI platform" is dead. "Agent platform" is the category now.

Anthropic

Claude Security: Autonomous Vulnerability Scanning (Public Beta)

Claude now scans codebases like a security researcher. Traces data flows across files, catches multi-component vulnerabilities, validates findings via adversarial self-review, then suggests patches. Parallel scanning, webhook integration (Slack/Jira), scheduled recurring scans. The adversarial self-review is the interesting pattern: Claude challenges its own findings before surfacing them. Enterprise only for now.

T-MAP: Red-Teaming Frontier Agents at 57.8% Attack Success

Source: ArXiv via Agentic Security Newsletter

Core insight: Single-shot prompt injection is yesterday's threat. T-MAP uses trajectory-aware evolutionary search to discover multi-step tool-use attack paths against LLM agents. Instead of one malicious prompt, it chains sequences of tool calls that individually look benign but collectively compromise the agent.

How it works: The framework evolves attack trajectories using an evolutionary algorithm. Each generation mutates tool-use sequences, evaluates them against the target agent, and selects the most effective paths for the next round. The search is trajectory-aware: it considers the full sequence of agent actions, not just individual steps.

The numbers: 57.8% average attack success rate against frontier models from major vendors. Sequential tool-use manipulation is far more effective than single-action attacks. The diversity of discovered attack paths means patching one vector doesn't close the others.

Why builders should care: If you're deploying agents with tool access in production, your threat model needs to include multi-step attacks. A single tool call might pass every safety check. Five tool calls in sequence might exfiltrate your database. T-MAP proves this isn't theoretical. It's reproducible against the best models available today.

Practical takeaway: Monitor tool-use trajectories, not just individual calls. Rate-limit sequential tool access. Consider trajectory-level guardrails that evaluate the full sequence of agent actions before execution. The Guardians framework (static verification for agent workflows) is one approach to this problem.

Time saved: 7 min read vs 45 min paper. 6.4x compression.

AgentCore Optimization: The Continuous Agent Quality Loop

Source: AWS ML Blog | Preview, Apr 30

Here's the truth about production agents: they degrade. Silently. Models update. User behavior shifts. Prompts get reused in contexts they were never designed for. And the fix has always been the same manual grind: user complains, developer reads traces, rewrites prompt, tests a handful of cases, ships blind.

Amazon Bedrock AgentCore Optimization replaces that cycle with a data-driven flywheel. Three capabilities, one loop:

Recommendations. Analyzes production traces and evaluation outputs. Generates optimized system prompts and tool descriptions targeted at the evaluator you specify. Not generic suggestions. Specific, measurable improvements tied to your agent's actual failure modes.

Batch Evaluations. Validates recommendations against pre-defined test datasets or LLM-simulated user sessions. Reports aggregate scores before anything touches production. You know if the change helps before your users do.

A/B Testing. Splits live production traffic through AgentCore Gateway at configurable percentages. Reports confidence intervals and p-values. Promotes the winner. Statistical rigor, not gut feeling.

The key design decision: every recommendation requires developer approval. The system proposes. You decide. This is the right trust boundary for production agents. Automation handles the analysis. Humans own the judgment calls.

Configuration ships as bundles: immutable, versioned snapshots of your agent config (model ID, system prompt, tool descriptions). Your agent reads its active config dynamically via the AgentCore SDK. Swapping a prompt is a config change, not a code change. Rollback is instant.

# Clone the sample repo
git clone https://github.com/awslabs/agentcore-samples.git
cd agentcore-samples/02-use-cases/market-trends-agent

# The full improvement loop:
# 1. Generate a recommendation from production traces
# 2. Package as a configuration bundle version
# 3. Validate with batch evaluation
# 4. A/B test against live traffic with statistical confidence
# 5. Promote to production

The mental model shift: agents aren't static deployments. They're living systems that observe their own performance, generate hypotheses about how to improve, validate those hypotheses with data, and evolve. The loop never stops. Traces feed evaluations. Evaluations feed recommendations. Recommendations feed tests. Tests feed production. Production feeds traces.

If you're running agents in production without a quality loop, you're flying blind. You shipped v1 and hoped for the best. AgentCore Optimization is the instrumentation that turns hope into engineering.

Agents don't just run. They get better.

Links: Blog · GitHub Sample · Docs

Weekly star tracker, May 6, 2026. Deltas vs. Issue #10 (April 29, 2026).

FrameworkStarsWeekly Δ
OpenClaw368,075+3,014
n8n186,622+858
Dify140,036+728
LangChain135,726+644
AutoGen57,688+206
Flowise52,518+209
CrewAI50,568+520
LlamaIndex49,121+157
LangGraph31,142+607
Semantic Kernel27,833+44
OpenAI Agents SDK25,845+460
Haystack25,069+73
Vercel AI SDK23,994+176
Mastra23,543+191
MS Agent Framework10,088+228
Strands Agents5,769+57

Notable moves: OpenClaw continues its dominance with +3,014 (down from +4,189 last week but still the clear leader). n8n holds steady at +858. The interesting story is LangGraph at +607, quietly closing the gap on Semantic Kernel. CrewAI's +520 shows the multi-agent orchestration category still has momentum. Semantic Kernel's +44 is the weakest showing in the index. Microsoft's energy is clearly going to Agent Framework (+228) instead.

Your Company Has 45 Agents Per Employee and Nobody Knows What They're Doing

Gravitee's 2026 report: 3 million agents running in corporations. Only 47% monitored. 88% of organizations report confirmed or suspected agent security incidents. Non-human identities outnumber humans 45:1 (80:1 in cloud-native orgs). We went from "shadow IT" to "shadow AI" to "shadow agents" in about 18 months. The pattern is familiar: teams deploy faster than security can track. But the blast radius is different. A shadow SaaS app leaks data. A shadow agent with tool access can modify data, call APIs, and make decisions. Microsoft's Agent 365 ($15/user/month, GA May 1) is the first reference architecture with proper agent identity via Entra Agent IDs. That's the right direction. But most orgs won't adopt it for another year. In the meantime, your agents are running unsupervised. Sleep well.

CertPrep

CertPrep

32,000+ practice questions for 106 certification exams from 22 vendors. AWS, Azure, GCP, CISSP, CCNA, Security+, CompTIA A+, Fortinet, Juniper, Kubernetes, Salesforce, SAP, Databricks and more. Timed practice tests, verified answers with detailed explanations for every option, bookmarks, progress tracking. Free tier for every exam. One-time purchase per exam, no subscriptions.

Download Free

Want to sponsor this newsletter? Get in touch

Like what you read?

Forward this to a friend who's building with agents.

Subscribe to The Agentic Engineer
💬 Join the discussion