Framework-first agent infrastructure for Python teams

Ship reliable LLM agents in Python — with guardrails.

Production primitives for tool-calling, memory, retries, rate limits, and structured outputs — plus approvals and policy enforcement that survive real users.

Browse templates

Typed

Pydantic outputs

Safe

Policy + PII filters

Observed

Traces + eval hooks

Agent Run Timeline mock run

trace_id: afp_8d2f guardrails: on

Plan planner

Break task into steps, choose tools, set success criteria.
Tool Call search_docs()

Execute tools with retries, rate limits, and structured arguments.
Observation tool_result

Store observations in memory; attach citations and trace spans.
Verify policy + eval

Run guardrails, schema checks, and evaluation tests pre-response.
Final approved

Return a crisp answer — or request human approval when risky.

runtime async / sync

providers OpenAI / Anthropic / OSS

ops tracing • prompts • evals

Agent infrastructure you can actually operate

Stop hand-rolling brittle agent loops. AgentFlow Python gives you deterministic interfaces for planning, tool execution, memory, and approvals — with instrumentation built in.

Tool runtime with guardrails

Tool calls are validated, rate-limited, retried with backoff, and logged. Failures are typed events, not stringly chaos.

Argument schemas + coercion
Retry policies per tool
Deterministic tool routing

Memory you can reason about

Use short-term scratchpads and long-term stores. Control recall with budgets, filters, and recency — and keep PII out.

Vector + structured memory
PII redaction at ingestion
Explicit retrieval budgets

Evals + traces as first-class

Ship with confidence. Hook eval suites into the run loop and stream trace spans to your observability stack.

Golden tests + regression
Trace spans + tool timing
Failure analytics

How it Works

A predictable loop: plan, call tools, verify outputs, and request approvals when risk rises.

Define

Declare tools, schemas, policies, and retry limits in code or config.

Plan

Planner produces steps with constraints: budgets, safe mode, and stop conditions.

Execute

Tool calls run with typed args, streaming traces, and automatic recovery on transient failures.

Approve

High-risk actions pause for human review with diffable proposed outputs.

agentflow.py

# define tools with typed args + policies
agent = AgentFlow(
  model="gpt-4.1-mini",
  tools=[search_docs, create_ticket, send_slack],
  output_schema=AnswerWithCitations,
  retries=RetryPolicy(max_attempts=3, backoff="expo"),
  rate_limits={"search_docs": "10/min"},
  guardrails=Guardrails(pii_redaction=True, jailbreak_defense=True),
)

result = agent.run("Summarize incident #1842 and propose remediation steps.")

Templates you can fork today

Opinionated starting points with real-world constraints: escalation paths, tool budgets, and deterministic outputs.

Trust + Safety that doesn’t bolt-on

Guardrails are enforced at the runtime boundary: before tools execute and before outputs ship.

Guardrails

Policy rules, PII redaction, jailbreak defense, and eval tests run as part of the agent loop — not as an afterthought.

policy rules PII redaction jailbreak defense eval tests

Risk Toggle

Adjust enforcement. Watch settings + tone change.

Medium

Tool permissions

Read-only tools + ticket creation w/ approval

PII handling

Redact emails, phones, tokens

Response constraints

Structured output + citations required

Approval mode

Human approval for external actions

Pricing

Start free, then scale guardrails, observability, and deployments with your team.

Starter

for prototypes

$0/mo

1 workspace
Local tracing
Basic templates
Community support

Pro

most shipped

$239/mo

Team workspaces
Retries + rate limiting
PII redaction + jailbreak defense
Eval suites + trace export

Enterprise

for regulated ops

Custom

SSO + RBAC
On-prem / VPC deployment
Custom policy packs
Dedicated support + SLAs

Talk to us

Capability	Starter	Pro	Enterprise
Tool calling runtime	✓	✓	✓
Structured outputs (Pydantic)	—	✓	✓
Retries + rate limits	basic	advanced	advanced + per-tenant
Guardrails (policy / PII / jailbreak)	—	✓	✓ + custom packs
Human-in-the-loop approvals	—	✓	✓ + workflows
Tracing + export	local	OTel + webhooks	OTel + SIEM

FAQ

Answers for builders who care about reproducibility, safety, and running agents in production.

No. The runtime abstracts providers and normalizes tool-calling + streaming. Bring OpenAI, Anthropic, or self-hosted models. Swap without rewriting your tool layer.

Any step can be marked requires_approval. The agent produces a proposed action payload, diffable and auditable. You can approve, edit, or deny; the run continues with the decision recorded.

Policy rules (allow/deny), jailbreak defense prompts + classifiers, PII redaction at input/output, schema validation for tool args and model outputs, plus eval hooks for regression testing.

Yes. You control what gets recalled and what gets sent. Use retrieval budgets, redaction, and allowlists to keep sensitive context local.

No. It’s a runtime with operational semantics: typed events, trace spans, failure modes, and configuration you can reason about. The goal is predictability under load — not vibes.