Test AI agents
before they touch production.
Opsical runs pre-deployment tests for AI agents that call tools. It simulates risky workflows, catches unsafe tool use, and fails the build before broken agents reach real users.
Built for teams deploying support and operations agents that can issue refunds, update CRMs, send emails, access customer data, or trigger workflows.
Your agent may pass the happy path. Opsical tests the paths that break production.
Agents are being shipped like demos, not software.
A prompt, model, memory, or tool-schema change can silently change what your agent does. It may call the wrong tool, leak customer data, skip approval, or loop forever — while still looking fine in a normal demo.
Evals score answers, not actions.
Traces show what happened after the risk already happened.
Tool-using agents can change real systems.
Teams lack a repeatable release gate for agent behavior.
Define how your agent should behave. Test every release.
Define rules, tools, mock data, and expected outcomes. Opsical turns them into repeatable tests your agent must pass before release.
agent: support-agent rules: - Never issue refunds without explicit user confirmation - Never reveal internal CRM notes - Escalate angry customers to a human - Stop after 5 failed tool calls tools: - get_order - refund_customer - send_email - update_crm
A test harness built for stateful, tool-using agents.
Generate scenarios
Opsical creates realistic and adversarial test cases from your agent's tools, rules, and expected behavior.
Run in a sandbox
Your agent interacts with simulated APIs, CRMs, databases, emails, refunds, and workflows — not production systems.
Fail unsafe builds
Catch approval violations, data leaks, loops, prompt injection failures, and behavior regressions before deploy.
What Opsical tests.
A behavioral test suite for the things that actually break in production.
Built first for agent teams using real frameworks.
Starting with LangGraph, OpenAI Agents SDK, and custom Python agents. More frameworks coming next.
More frameworks coming next. Tell us what you use.
Watch Opsical catch a hidden agent failure.
A support agent passes the happy path, then fails under adversarial scenarios.
Built for agents that use real tools.
Start with support agents that take real actions
Refunds. CRM updates. Email replies. Escalations. Customer data access. These are the agents where one bad tool call creates immediate damage.
Sales agents
Lead writes, CRM updates, follow-up emails.
Internal operations
Tickets, runbooks, multi-step approvals.
Finance agents
Invoicing, payouts, vendor actions.
Database agents
Reads, writes, schema-aware mutations.
Workflow agents
Cross-tool automations and approvals.
Coding agents
Repo changes, PRs, test runs, deploys.
Block unsafe agent releases in CI.
Add Opsical to your pipeline and fail the build when a prompt, model, memory, or tool change introduces dangerous behavior.
- name: Run Opsical agent tests run: opsical run --fail-on-critical
More than traces. More than evals.
Tracing shows what happened after a run. Evals score what the model said. Opsical tests what the agent is allowed to do before it ships.
No production access required.
Opsical runs agents against simulated tools and mock data, so teams can test dangerous workflows without touching real users, real CRMs, or real payment systems.
Read-only by default. Simulated CRMs, databases, email, and payment APIs. Your production stays untouched.
Don't wait for your agent to fail in production.
Join the Opsical beta and start testing unsafe behavior before it reaches users.