awaf is an open framework for evaluating AI agent architecture across 10 pillars. Score your agent. Find the gaps. Ship with confidence.
A complete architectural model for agent systems, from foundational requirements to agent-native concerns that have no cloud equivalent. Read the intro post →
Agents must own their domain end-to-end with independent tools, context, and data. A vertically sliced agent owns its domain: its tools, its context, its data.
SLOs, playbooks, and postmortems. Determines whether the other pillars remain effective in production.
Enforced in code, not prompts. Credentials must never enter the agent. Blast radius must be explicitly bounded.
Designed for failure, not just uptime. Chain boundaries as fault domains. Fail-loud behavior and circuit breakers at the MCP layer. Checkpoint/resume for multi-step runs.
Optimizes execution speed and resource usage across agent operations.
Tracks every token and tool call. Session budgets and loop detection from day one. Hard stop at 100% budget. Non-negotiable. Prevents solutions that cost more than the problems they solve.
Long-term viability and environmental considerations adapted from cloud WAF principles.
Addresses silent, confident failures — the worst failure type. Agents can hallucinate arguments, select wrong tools, or derail without visible errors. Requires evals covering tool selection, argument accuracy, and chain-of-thought faithfulness.
Human control through code-level enforcement, not prompts. Any in-flight agent must be externally stoppable. Requires pause, notify, and resume/abort primitives.
Manages agent perception of reality. Prevents stale context from corrupting reasoning. Requires external content sanitization through MCP and active lifecycle management for long sessions. The agent must understand its own knowledge limitations.
Spec-first. Multiple implementations. Community-owned.
The canonical spec. FRAMEWORK.md defines all 10 pillars, scoring, and readiness ratings.
Reference implementation. Static analysis, CI/CD integration, GitHub Action. pip install awaf.
Dialogue-driven assessments in Claude Code. Accepts code, docs, exports, or verbal descriptions.
AI-powered planning tool grounded in market analysis and Anthropic Economic Index data. Built on awaf.
The spec is open. Implementations are open.
If you build agents in production, your patterns belong here.