Tool runtime with guardrails
Tool calls are validated, rate-limited, retried with backoff, and logged. Failures are typed events, not stringly chaos.
- Argument schemas + coercion
- Retry policies per tool
- Deterministic tool routing
Production primitives for tool-calling, memory, retries, rate limits, and structured outputs — plus approvals and policy enforcement that survive real users.
Break task into steps, choose tools, set success criteria.
Execute tools with retries, rate limits, and structured arguments.
Store observations in memory; attach citations and trace spans.
Run guardrails, schema checks, and evaluation tests pre-response.
Return a crisp answer — or request human approval when risky.
Stop hand-rolling brittle agent loops. AgentFlow Python gives you deterministic interfaces for planning, tool execution, memory, and approvals — with instrumentation built in.
Tool calls are validated, rate-limited, retried with backoff, and logged. Failures are typed events, not stringly chaos.
Use short-term scratchpads and long-term stores. Control recall with budgets, filters, and recency — and keep PII out.
Ship with confidence. Hook eval suites into the run loop and stream trace spans to your observability stack.
A predictable loop: plan, call tools, verify outputs, and request approvals when risk rises.
Declare tools, schemas, policies, and retry limits in code or config.
Planner produces steps with constraints: budgets, safe mode, and stop conditions.
Tool calls run with typed args, streaming traces, and automatic recovery on transient failures.
High-risk actions pause for human review with diffable proposed outputs.
# define tools with typed args + policies
agent = AgentFlow(
model="gpt-4.1-mini",
tools=[search_docs, create_ticket, send_slack],
output_schema=AnswerWithCitations,
retries=RetryPolicy(max_attempts=3, backoff="expo"),
rate_limits={"search_docs": "10/min"},
guardrails=Guardrails(pii_redaction=True, jailbreak_defense=True),
)
result = agent.run("Summarize incident #1842 and propose remediation steps.")
Opinionated starting points with real-world constraints: escalation paths, tool budgets, and deterministic outputs.
Guardrails are enforced at the runtime boundary: before tools execute and before outputs ship.
Policy rules, PII redaction, jailbreak defense, and eval tests run as part of the agent loop — not as an afterthought.
Adjust enforcement. Watch settings + tone change.
Start free, then scale guardrails, observability, and deployments with your team.
| Capability | Starter | Pro | Enterprise |
|---|---|---|---|
| Tool calling runtime | ✓ | ✓ | ✓ |
| Structured outputs (Pydantic) | — | ✓ | ✓ |
| Retries + rate limits | basic | advanced | advanced + per-tenant |
| Guardrails (policy / PII / jailbreak) | — | ✓ | ✓ + custom packs |
| Human-in-the-loop approvals | — | ✓ | ✓ + workflows |
| Tracing + export | local | OTel + webhooks | OTel + SIEM |
Answers for builders who care about reproducibility, safety, and running agents in production.
No. The runtime abstracts providers and normalizes tool-calling + streaming. Bring OpenAI, Anthropic, or self-hosted models. Swap without rewriting your tool layer.
Any step can be marked requires_approval. The agent produces a proposed action payload,
diffable and auditable. You can approve, edit, or deny; the run continues with the decision recorded.
Policy rules (allow/deny), jailbreak defense prompts + classifiers, PII redaction at input/output, schema validation for tool args and model outputs, plus eval hooks for regression testing.
Yes. You control what gets recalled and what gets sent. Use retrieval budgets, redaction, and allowlists to keep sensitive context local.
No. It’s a runtime with operational semantics: typed events, trace spans, failure modes, and configuration you can reason about. The goal is predictability under load — not vibes.