Framework-first agent infrastructure for Python teams

Ship reliable LLM agents in Python — with guardrails.

Production primitives for tool-calling, memory, retries, rate limits, and structured outputs — plus approvals and policy enforcement that survive real users.

Browse templates
Typed
Pydantic outputs
Safe
Policy + PII filters
Observed
Traces + eval hooks
Agent Run Timeline mock run
trace_id: afp_8d2f guardrails: on
  1. Plan planner

    Break task into steps, choose tools, set success criteria.

  2. Tool Call search_docs()

    Execute tools with retries, rate limits, and structured arguments.

  3. Observation tool_result

    Store observations in memory; attach citations and trace spans.

  4. Verify policy + eval

    Run guardrails, schema checks, and evaluation tests pre-response.

  5. Final approved

    Return a crisp answer — or request human approval when risky.

runtime async / sync
providers OpenAI / Anthropic / OSS
ops tracing • prompts • evals

Agent infrastructure you can actually operate

Stop hand-rolling brittle agent loops. AgentFlow Python gives you deterministic interfaces for planning, tool execution, memory, and approvals — with instrumentation built in.

Tool runtime with guardrails

Tool calls are validated, rate-limited, retried with backoff, and logged. Failures are typed events, not stringly chaos.

  • Argument schemas + coercion
  • Retry policies per tool
  • Deterministic tool routing

Memory you can reason about

Use short-term scratchpads and long-term stores. Control recall with budgets, filters, and recency — and keep PII out.

  • Vector + structured memory
  • PII redaction at ingestion
  • Explicit retrieval budgets

Evals + traces as first-class

Ship with confidence. Hook eval suites into the run loop and stream trace spans to your observability stack.

  • Golden tests + regression
  • Trace spans + tool timing
  • Failure analytics

How it Works

A predictable loop: plan, call tools, verify outputs, and request approvals when risk rises.

Define

Declare tools, schemas, policies, and retry limits in code or config.

Plan

Planner produces steps with constraints: budgets, safe mode, and stop conditions.

Execute

Tool calls run with typed args, streaming traces, and automatic recovery on transient failures.

Approve

High-risk actions pause for human review with diffable proposed outputs.

agentflow.py
# define tools with typed args + policies
agent = AgentFlow(
  model="gpt-4.1-mini",
  tools=[search_docs, create_ticket, send_slack],
  output_schema=AnswerWithCitations,
  retries=RetryPolicy(max_attempts=3, backoff="expo"),
  rate_limits={"search_docs": "10/min"},
  guardrails=Guardrails(pii_redaction=True, jailbreak_defense=True),
)

result = agent.run("Summarize incident #1842 and propose remediation steps.")

Templates you can fork today

Opinionated starting points with real-world constraints: escalation paths, tool budgets, and deterministic outputs.

Trust + Safety that doesn’t bolt-on

Guardrails are enforced at the runtime boundary: before tools execute and before outputs ship.

Guardrails

Policy rules, PII redaction, jailbreak defense, and eval tests run as part of the agent loop — not as an afterthought.

policy rules PII redaction jailbreak defense eval tests

Risk Toggle

Adjust enforcement. Watch settings + tone change.

Medium
Tool permissions
Read-only tools + ticket creation w/ approval
PII handling
Redact emails, phones, tokens
Response constraints
Structured output + citations required
Approval mode
Human approval for external actions

Pricing

Start free, then scale guardrails, observability, and deployments with your team.

Starter

for prototypes
$0/mo
  • 1 workspace
  • Local tracing
  • Basic templates
  • Community support

Enterprise

for regulated ops
Custom
  • SSO + RBAC
  • On-prem / VPC deployment
  • Custom policy packs
  • Dedicated support + SLAs
Talk to us
Capability Starter Pro Enterprise
Tool calling runtime
Structured outputs (Pydantic)
Retries + rate limits basic advanced advanced + per-tenant
Guardrails (policy / PII / jailbreak) ✓ + custom packs
Human-in-the-loop approvals ✓ + workflows
Tracing + export local OTel + webhooks OTel + SIEM

FAQ

Answers for builders who care about reproducibility, safety, and running agents in production.

No. The runtime abstracts providers and normalizes tool-calling + streaming. Bring OpenAI, Anthropic, or self-hosted models. Swap without rewriting your tool layer.

Any step can be marked requires_approval. The agent produces a proposed action payload, diffable and auditable. You can approve, edit, or deny; the run continues with the decision recorded.

Policy rules (allow/deny), jailbreak defense prompts + classifiers, PII redaction at input/output, schema validation for tool args and model outputs, plus eval hooks for regression testing.

Yes. You control what gets recalled and what gets sent. Use retrieval budgets, redaction, and allowlists to keep sensitive context local.

No. It’s a runtime with operational semantics: typed events, trace spans, failure modes, and configuration you can reason about. The goal is predictability under load — not vibes.