Private beta now open

Test AI agents
before they touch production.

Opsical runs pre-deployment tests for AI agents that call tools. It simulates risky workflows, catches unsafe tool use, and fails the build before broken agents reach real users.

Built for teams deploying support and operations agents that can issue refunds, update CRMs, send emails, access customer data, or trigger workflows.

bash — opsical
$opsical run --agent support-agent
✓ 100 scenarios generated
✕ 3 critical failures found
CRITICALrefund_customer() called before manager approval
CRITICALinternal CRM note exposed to customer
HIGHagent looped after database timeout
Build failed: unsafe agent behavior detected.

Your agent may pass the happy path. Opsical tests the paths that break production.

The problem

Agents are being shipped like demos, not software.

A prompt, model, memory, or tool-schema change can silently change what your agent does. It may call the wrong tool, leak customer data, skip approval, or loop forever — while still looking fine in a normal demo.

Evals score answers, not actions.

Traces show what happened after the risk already happened.

Tool-using agents can change real systems.

Teams lack a repeatable release gate for agent behavior.

Define behavior

Define how your agent should behave. Test every release.

Define rules, tools, mock data, and expected outcomes. Opsical turns them into repeatable tests your agent must pass before release.

opsical.yamlyaml
agent: support-agent

rules:
  - Never issue refunds without explicit user confirmation
  - Never reveal internal CRM notes
  - Escalate angry customers to a human
  - Stop after 5 failed tool calls

tools:
  - get_order
  - refund_customer
  - send_email
  - update_crm
How it works

A test harness built for stateful, tool-using agents.

01

Generate scenarios

Opsical creates realistic and adversarial test cases from your agent's tools, rules, and expected behavior.

02

Run in a sandbox

Your agent interacts with simulated APIs, CRMs, databases, emails, refunds, and workflows — not production systems.

03

Fail unsafe builds

Catch approval violations, data leaks, loops, prompt injection failures, and behavior regressions before deploy.

Coverage

What Opsical tests.

A behavioral test suite for the things that actually break in production.

Tool call correctness
Approval rule violations
Data leakage
Prompt injection resistance
Infinite loops and retry failures
Unsafe writes to CRMs, databases, or payment systems
Behavioral regressions after prompt, model, memory, or tool changes
Integrations

Built first for agent teams using real frameworks.

Starting with LangGraph, OpenAI Agents SDK, and custom Python agents. More frameworks coming next.

LangGraphbetaOpenAI Agents SDKbetaCustom Python agentsbetaCrewAIplannedVercel AI SDKplannedAutoGenplanned

More frameworks coming next. Tell us what you use.

Live walkthrough

Watch Opsical catch a hidden agent failure.

A support agent passes the happy path, then fails under adversarial scenarios.

Use cases

Built for agents that use real tools.

Starting here

Start with support agents that take real actions

Refunds. CRM updates. Email replies. Escalations. Customer data access. These are the agents where one bad tool call creates immediate damage.

Later expansion
Coming soon

Sales agents

Lead writes, CRM updates, follow-up emails.

Coming soon

Internal operations

Tickets, runbooks, multi-step approvals.

Coming soon

Finance agents

Invoicing, payouts, vendor actions.

Coming soon

Database agents

Reads, writes, schema-aware mutations.

Coming soon

Workflow agents

Cross-tool automations and approvals.

Coming soon

Coding agents

Repo changes, PRs, test runs, deploys.

CI / CD

Block unsafe agent releases in CI.

Add Opsical to your pipeline and fail the build when a prompt, model, memory, or tool change introduces dangerous behavior.

.github/workflows/agent-tests.ymlyaml
- name: Run Opsical agent tests
  run: opsical run --fail-on-critical
Why Opsical

More than traces. More than evals.

Tracing shows what happened after a run. Evals score what the model said. Opsical tests what the agent is allowed to do before it ships.

Capability
Tracing tools
LLM eval tools
Opsical
Shows what happened after a run
Scores output quality
Tests tool calls and approval rules
Generates adversarial workflows
Catches regressions before deploy
Runs in CI as a build gate
Security

No production access required.

Opsical runs agents against simulated tools and mock data, so teams can test dangerous workflows without touching real users, real CRMs, or real payment systems.

Read-only by default. Simulated CRMs, databases, email, and payment APIs. Your production stays untouched.

Don't wait for your agent to fail in production.

Join the Opsical beta and start testing unsafe behavior before it reaches users.