PLATFORM · AGENT LAB

Code-first agent development without the platform tax.

Agent Lab is the engineer's surface for the same agents Flow Studio operators design. TypeScript and Python SDKs, structured prompt management, local debugging, a deterministic test harness, and a runtime that doesn't make you choose between iteration speed and production readiness.

Local first. Production identical.

Agent Lab runs entirely on your laptop. Local model providers, local fixtures, local Registry shadow, local Observe trace viewer. The development loop is fast because nothing has to round-trip a remote service.

When the agent ships, the runtime it runs against is the same runtime the SDK targeted locally. Local-vs-prod parity is not a marketing claim — it is enforced by the SDK importing the same runtime contract that Orchestrator implements.

Your engineering team does not maintain two mental models of how an agent behaves. They maintain one, and the one they maintain is the one your customers see.

Two SDKs. One contract.

TypeScript

First-class TypeScript SDK for the engineering teams that already work in Node, Bun, or Deno. Strict types on every tool definition, prompt template, and runtime hook. Source maps and stack traces work; nothing is bundled, transpiled, or hidden.

Tool definitions are inferred from Zod schemas. Prompt templates type-check against their input variables. Runtime errors carry typed error classes you can catch and route, not free-text strings.

Python

First-class Python SDK for teams whose ML, data, and integration code already lives in Python. Type-checked under `mypy --strict`. Async-first runtime, sync escape hatches for legacy code, no `asyncio` surprises.

Pydantic models everywhere a TypeScript engineer would expect Zod. Both SDKs target the same agent runtime contract, so a flow can switch SDKs over its lifetime without changing the flow definition.

What Agent Lab gives you.

Structured prompts

Prompts as versioned artifacts in Registry, edited in code or in Studio. Variables type-check against tool inputs. Prompt rollback is a registry rollback, not a deploy.

Deterministic tests

Mock the model. Mock the tools. Replay production traces. Snapshot expected outputs. Run the suite in your existing CI; failures block the merge.

Local Registry shadow

Your laptop's Registry mirrors production for the artifacts you've checked out. You see what's deployed, what's pending approval, what's drifting — without standing access to production.

Trace viewer

Every local run streams structured traces into the Observe local viewer. Tool calls, model calls, retries, handoffs, latencies — searchable, exportable, attachable to bug reports.

Pilot Agent Lab against a real engineering workflow.

Pilots scope to one engineering team, one named workflow, six to ten weeks. Lab ships into your existing source control, integrates with your CI, and respects your Registry's approval policy. We do not offer self-serve trials.

Request a pilot Back to Platform