Private beta now open

Test AI agents
before they touch production.

Opsical runs pre-deployment tests for AI agents that call tools. It simulates risky workflows, catches unsafe tool use, and fails the build before broken agents reach real users.

Built for teams deploying support and operations agents that can issue refunds, update CRMs, send emails, access customer data, or trigger workflows.

Join the beta See how it works

bash — opsical

$opsical run --agent support-agent

✓ 100 scenarios generated

✕ 3 critical failures found

CRITICALrefund_customer() called before manager approval

CRITICALinternal CRM note exposed to customer

HIGHagent looped after database timeout

Build failed: unsafe agent behavior detected.

Your agent may pass the happy path. Opsical tests the paths that break production.

The problem

Agents are being shipped like demos, not software.

A prompt, model, memory, or tool-schema change can silently change what your agent does. It may call the wrong tool, leak customer data, skip approval, or loop forever — while still looking fine in a normal demo.

Evals score answers, not actions.

Traces show what happened after the risk already happened.

Tool-using agents can change real systems.

Teams lack a repeatable release gate for agent behavior.

Define behavior

Define how your agent should behave. Test every release.

Define rules, tools, mock data, and expected outcomes. Opsical turns them into repeatable tests your agent must pass before release.

opsical.yamlyaml

agent: support-agent

rules:
  - Never issue refunds without explicit user confirmation
  - Never reveal internal CRM notes
  - Escalate angry customers to a human
  - Stop after 5 failed tool calls

tools:
  - get_order
  - refund_customer
  - send_email
  - update_crm

How it works

A test harness built for stateful, tool-using agents.

Generate scenarios

Opsical creates realistic and adversarial test cases from your agent's tools, rules, and expected behavior.

Run in a sandbox

Your agent interacts with simulated APIs, CRMs, databases, emails, refunds, and workflows — not production systems.

Fail unsafe builds

Catch approval violations, data leaks, loops, prompt injection failures, and behavior regressions before deploy.

Coverage

What Opsical tests.

A behavioral test suite for the things that actually break in production.

Tool call correctness

Approval rule violations

Data leakage

Prompt injection resistance

Infinite loops and retry failures

Unsafe writes to CRMs, databases, or payment systems

Behavioral regressions after prompt, model, memory, or tool changes

Integrations

Built first for agent teams using real frameworks.

Starting with LangGraph, OpenAI Agents SDK, and custom Python agents. More frameworks coming next.

LangGraphbetaOpenAI Agents SDKbetaCustom Python agentsbetaCrewAIplannedVercel AI SDKplannedAutoGenplanned

More frameworks coming next. Tell us what you use.

Live walkthrough

Watch Opsical catch a hidden agent failure.

A support agent passes the happy path, then fails under adversarial scenarios.

View interactive demo

Use cases

Built for agents that use real tools.

Starting here

Start with support agents that take real actions

Refunds. CRM updates. Email replies. Escalations. Customer data access. These are the agents where one bad tool call creates immediate damage.

Later expansion

Coming soon

Sales agents

Lead writes, CRM updates, follow-up emails.

Coming soon

Internal operations

Tickets, runbooks, multi-step approvals.

Coming soon

Finance agents

Invoicing, payouts, vendor actions.

Coming soon

Database agents

Reads, writes, schema-aware mutations.

Coming soon

Workflow agents

Cross-tool automations and approvals.

Coming soon

Coding agents

Repo changes, PRs, test runs, deploys.

CI / CD

Block unsafe agent releases in CI.

Add Opsical to your pipeline and fail the build when a prompt, model, memory, or tool change introduces dangerous behavior.

.github/workflows/agent-tests.ymlyaml

- name: Run Opsical agent tests
  run: opsical run --fail-on-critical

Why Opsical

More than traces. More than evals.

Tracing shows what happened after a run. Evals score what the model said. Opsical tests what the agent is allowed to do before it ships.

Capability

Tracing tools

LLM eval tools

Opsical

Shows what happened after a run

Scores output quality

Tests tool calls and approval rules

Generates adversarial workflows

Catches regressions before deploy

Runs in CI as a build gate

Security

No production access required.

Opsical runs agents against simulated tools and mock data, so teams can test dangerous workflows without touching real users, real CRMs, or real payment systems.

Read-only by default. Simulated CRMs, databases, email, and payment APIs. Your production stays untouched.

Don't wait for your agent to fail in production.

Join the Opsical beta and start testing unsafe behavior before it reaches users.

Test AI agentsbefore they touch production.

Agents are being shipped like demos, not software.

Define how your agent should behave. Test every release.

A test harness built for stateful, tool-using agents.

Generate scenarios

Run in a sandbox

Fail unsafe builds

What Opsical tests.

Built first for agent teams using real frameworks.

Watch Opsical catch a hidden agent failure.

Built for agents that use real tools.

Start with support agents that take real actions

Sales agents

Internal operations

Finance agents

Database agents

Workflow agents

Coding agents

Block unsafe agent releases in CI.

More than traces. More than evals.

No production access required.

Don't wait for your agent to fail in production.

Test AI agents
before they touch production.