Documentation
Financial guardrails for AI agents. Find hidden cost waste, prove reliability, and optimize margins.
Our Mission
Enterprises are bleeding cash on unpredictable agentic workflows. HoneycombOS is the financial guardrail that caps token waste, proves agent reliability, and optimizes cost-per-outcome — not cost-per-token. We show you what you're wasting for free. We fix it and share the savings.
Get up and running in under 5 minutes. Register, grab your API key, and run your first evaluation.
Register
curl -X POST https://api.honeycombos.ai/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "you@company.com", "name": "Your Company"}'Get your API key
Your API key is returned in the registration response. It starts with hcos_.
# Your key: hcos_abc123...
Submit a contract
curl -X POST https://api.honeycombos.ai/v1/evaluation/contracts \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "your-tenant-id",
"contract": {
"agent_id": "my-agent",
"agent_name": "My Agent",
"endpoint": "https://my-app.com/api/agent",
"inputs": [
{"name": "intent", "type": "enum", "values": ["question", "complaint", "request"]},
{"name": "urgency", "type": "enum", "values": ["low", "medium", "high"]}
],
"quality_rules": [
{
"name": "high_urgency_flagged",
"rule": "if urgency == high then requires_review == true",
"severity": "critical"
}
]
}
}'Run evaluation
curl -X POST https://api.honeycombos.ai/v1/evaluation/run \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{"contract_id": "returned-contract-id"}'The Model Context Protocol (MCP) is the primary integration method. Connect your IDE to HoneycombOS in one config block.
IDE Configuration
Claude Desktop (Easiest)
Go to Settings → MCP Servers → Add Custom MCP:
HoneycombOShttps://honeycombos-agenteval-production.up.railway.app/mcp/sseThat's it. 10 HoneycombOS tools will appear in your conversations. Say “evaluate my agent” to get started.
Claude Code (config file)
Add to ~/.claude/settings.json or project .claude/settings.json
{
"mcpServers": {
"honeycombos": {
"url": "https://honeycombos-agenteval-production.up.railway.app/mcp/sse"
}
}
}Available MCP Tools
| Tool | Description |
|---|---|
| honeycombos_register_agent | Register an agent endpoint for evaluation |
| honeycombos_define_contract | Auto-generate evaluation contract from your code |
| honeycombos_evaluate | Run permutation matrix against your agent |
| honeycombos_get_results | Get evaluation results and failure report |
| honeycombos_compare_models | Compare agent quality across models |
| honeycombos_check_regression | Check if changes caused quality regression |
| honeycombos_estimate_cost | Estimate evaluation cost before running |
| Tool | Description |
|---|---|
| honeycombos_optimize_prompt | Automated prompt optimization (autoresearch) |
| honeycombos_get_optimization_status | Check optimization progress |
| honeycombos_get_recommendations | Get actionable improvement recommendations |
Base URL: https://api.honeycombos.ai. All requests require an Authorization: Bearer hcos_... header (except registration).
Auth
Contracts
Evaluation
Autoresearch
Regression
Common operations via cURL. Replace hcos_abc123... with your API key.
List your contracts
curl -s https://api.honeycombos.ai/v1/evaluation/contracts \ -H "Authorization: Bearer hcos_abc123..." | jq
Estimate evaluation cost
curl -X POST https://api.honeycombos.ai/v1/evaluation/estimate \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{"contract_id": "your-contract-id"}'Compare models
curl -X POST https://api.honeycombos.ai/v1/evaluation/compare \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{
"contract_id": "your-contract-id",
"models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"]
}'Run regression check
curl -X POST https://api.honeycombos.ai/v1/evaluation/regression/check \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{
"contract_id": "your-contract-id",
"baseline_run_id": "previous-run-id"
}'Start autoresearch
curl -X POST https://api.honeycombos.ai/v1/evaluation/autoresearch \
-H "Authorization: Bearer hcos_abc123..." \
-H "Content-Type: application/json" \
-d '{
"contract_id": "your-contract-id",
"max_iterations": 20
}'Check usage
curl -s https://api.honeycombos.ai/auth/usage \ -H "Authorization: Bearer hcos_abc123..." | jq
Detailed setup instructions for each supported IDE. After configuration, the HoneycombOS tools will appear in your AI assistant's tool list.
Claude Code / Claude Desktop
- Open
~/.claude/settings.json(global) or.claude/settings.json(per project) - Add the MCP server config (see MCP Integration above)
- Restart Claude Code or reload the window
- Type “evaluate my agent” to verify the tools are available
Cursor
- Create
.cursor/mcp.jsonin your project root - Add the MCP server config
- Restart Cursor
- Tools appear in the Composer panel
VS Code (GitHub Copilot)
- Create
.vscode/mcp.jsonin your project root - Add the server config (note: uses
serverskey, notmcpServers) - Reload VS Code window
- Available via Copilot Chat in Agent mode
Windsurf / Codex
- Open
~/.codeium/windsurf/mcp_config.json - Add the MCP server config
- Restart Windsurf
- For Codex, add the same config to your Codex settings
Core concepts behind HoneycombOS evaluation and optimization.
Agent Interface Contract
A YAML/JSON schema describing your agent's inputs, outputs, and quality rules. HoneycombOS generates the evaluation test suite from this. Think of it as a test spec for your AI agent.
Permutation Matrix
Every combination of input values tested exhaustively. Not sampling — full enumeration. A 5-axis agent with 4 values each = 1,024 test cases. This is how you find the edge cases sampling misses.
Quality Rules
Assertions about what your agent must do. “If urgency is high, must flag for review.” Rules have severity levels: critical, warning, and info. Critical failures block deploys.
Autoresearch
Automated prompt optimization. Analyzes failures, hypothesizes fixes, re-runs the matrix, keeps improvements. Typically converges in 10–20 iterations. Available on Pro plans.
Regression Gate
A 6-layer check comparing current vs baseline evaluation. Blocks deploys if quality drops. Can be integrated into CI/CD via webhooks. Catches regressions before they reach production.
We only get paid when you save money. Free to audit your AI spend. Performance-based pricing on optimization.
“Show me the waste”
- Full cost analysis & CFO Readout
- Reliability report (pass/fail per agent)
- Model comparison (cost vs quality)
- Token burn rate analysis
- 3 agents, 10 evaluations/day
- Regression checks
“Fix it, share the upside”
- Everything in Free
- Unlimited agents & runs
- Automated prompt optimization
- Documented savings with proof
- Model routing recommendations
- CI/CD regression gates
- You only pay if we save you money
Example: $20K/mo AI spend → we find $8K waste → you pay 20% of savings ($1,600/mo), net save $6,400/mo