Documentation

Financial guardrails for AI agents. Find hidden cost waste, prove reliability, and optimize margins.

Our Mission

Enterprises are bleeding cash on unpredictable agentic workflows. HoneycombOS is the financial guardrail that caps token waste, proves agent reliability, and optimizes cost-per-outcome — not cost-per-token. We show you what you're wasting for free. We fix it and share the savings.

Get up and running in under 5 minutes. Register, grab your API key, and run your first evaluation.

Register

bash

curl -X POST https://honeycombos-agenteval-production.up.railway.app/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com", "name": "Your Company"}'

Get your API key

Your API key is returned in the registration response. It starts with hcos_.

text

# Your key: hcos_abc123...

Submit a contract

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/contracts \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "your-tenant-id",
    "contract": {
      "agent_id": "my-agent",
      "agent_name": "My Agent",
      "endpoint": "https://my-app.com/api/agent",
      "inputs": [
        {"name": "intent", "type": "enum", "values": ["question", "complaint", "request"]},
        {"name": "urgency", "type": "enum", "values": ["low", "medium", "high"]}
      ],
      "quality_rules": [
        {
          "name": "high_urgency_flagged",
          "rule": "if urgency == high then requires_review == true",
          "severity": "critical"
        }
      ]
    }
  }'

Run evaluation

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/run \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"contract_id": "returned-contract-id"}'

The Model Context Protocol (MCP) is the primary integration method. Connect your IDE to HoneycombOS in one config block.

IDE Configuration

Claude Desktop (Easiest)

Go to Settings → MCP Servers → Add Custom MCP:

Name:HoneycombOS

URL:https://honeycombos-agenteval-production.up.railway.app/mcp/sse?api_key=hcos_your_key_here

Auth:Append your API key in the URL

That's it. 10 HoneycombOS tools will appear in your conversations. Say “evaluate my agent” to get started.

Claude Code (config file)

Add to ~/.claude/settings.json or project .claude/settings.json

json

{
  "mcpServers": {
    "honeycombos": {
      "url": "https://honeycombos-agenteval-production.up.railway.app/mcp/sse?api_key=hcos_your_key_here"
    }
  }
}

Available MCP Tools

Free Tools

Tool	Description
honeycombos_register_agent	Register an agent endpoint for evaluation
honeycombos_define_contract	Auto-generate evaluation contract from your code
honeycombos_evaluate	Run permutation matrix against your agent
honeycombos_get_results	Get evaluation results and failure report
honeycombos_compare_models	Compare agent quality across models
honeycombos_check_regression	Check if changes caused quality regression
honeycombos_estimate_cost	Estimate evaluation cost before running

Pro Tools

Tool	Description
honeycombos_optimize_prompt	Automated prompt optimization (autoresearch)
honeycombos_get_optimization_status	Check optimization progress
honeycombos_get_recommendations	Get actionable improvement recommendations

Base URL: https://api.honeycombos.ai. All requests require an Authorization: Bearer hcos_... header (except registration).

Auth

POST/v1/auth/register

Create account, get API key

GET/v1/auth/keys

List your API keys

GET/v1/auth/me

Account info + usage stats

DELETE/v1/auth/keys/{prefix}

Revoke an API key

POST/v1/auth/upgrade

Upgrade to paid tier

Contracts

POST/v1/evaluation/contracts

Create a new evaluation contract

GET/v1/evaluation/contracts

List all contracts

GET/v1/evaluation/contracts/{id}

Get contract by ID

Evaluation

POST/v1/evaluation/run

Start an evaluation run

GET/v1/evaluation/runs

List all evaluation runs

GET/v1/evaluation/runs/{id}

Get run details

GET/v1/evaluation/runs/{id}/report

Get full evaluation report

POST/v1/evaluation/compare

Compare multiple agent configurations

POST/v1/evaluation/estimate

Estimate cost before running

Autoresearch

POST/v1/evaluation/autoresearch

Start automated prompt optimization

GET/v1/evaluation/autoresearch/{id}

Get optimization status

POST/v1/evaluation/autoresearch/{id}/stop

Stop an optimization run

Regression

POST/v1/evaluation/regression/check

Run regression check vs baseline

POST/v1/evaluation/regression/webhook

Configure regression webhook

Common operations via cURL. Replace hcos_abc123... with your API key.

List your contracts

bash

curl -s https://api.honeycombos.ai/v1/evaluation/contracts \
  -H "Authorization: Bearer hcos_abc123..." | jq

Estimate evaluation cost

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/estimate \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"contract_id": "your-contract-id"}'

Compare models

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/compare \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"]
  }'

Run regression check

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/regression/check \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "baseline_run_id": "previous-run-id"
  }'

Start autoresearch

bash

curl -X POST https://api.honeycombos.ai/v1/evaluation/autoresearch \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "max_iterations": 20
  }'

Check usage

bash

curl -s https://honeycombos-agenteval-production.up.railway.app/v1/auth/usage \
  -H "Authorization: Bearer hcos_abc123..." | jq

Detailed setup instructions for each supported IDE. After configuration, the HoneycombOS tools will appear in your AI assistant's tool list.

Claude Code / Claude Desktop

Open ~/.claude/settings.json (global) or .claude/settings.json (per project)
Add the MCP server config (see MCP Integration above)
Restart Claude Code or reload the window
Type “evaluate my agent” to verify the tools are available

Cursor

Create .cursor/mcp.json in your project root
Add the MCP server config
Restart Cursor
Tools appear in the Composer panel

VS Code (GitHub Copilot)

Create .vscode/mcp.json in your project root
Add the server config (note: uses servers key, not mcpServers)
Reload VS Code window
Available via Copilot Chat in Agent mode

Windsurf / Codex

Open ~/.codeium/windsurf/mcp_config.json
Add the MCP server config
Restart Windsurf
For Codex, add the same config to your Codex settings

Core concepts behind HoneycombOS evaluation and optimization.

Agent Interface Contract

A YAML/JSON schema describing your agent's inputs, outputs, and quality rules. HoneycombOS generates the evaluation test suite from this. Think of it as a test spec for your AI agent.

Permutation Matrix

Every combination of input values tested exhaustively. Not sampling — full enumeration. A 5-axis agent with 4 values each = 1,024 test cases. This is how you find the edge cases sampling misses.

Quality Rules

Assertions about what your agent must do. “If urgency is high, must flag for review.” Rules have severity levels: critical, warning, and info. Critical failures block deploys.

Autoresearch

Automated prompt optimization. Analyzes failures, hypothesizes fixes, re-runs the matrix, keeps improvements. Typically converges in 10–20 iterations. Available on Pro plans.

Regression Gate

A 6-layer check comparing current vs baseline evaluation. Blocks deploys if quality drops. Can be integrated into CI/CD via webhooks. Catches regressions before they reach production.

We only get paid when you save money. Free to audit your AI spend. Performance-based pricing on optimization.

Free Audit

“Show me the waste”

Full cost analysis & CFO Readout
Reliability report (pass/fail per agent)
Model comparison (cost vs quality)
Token burn rate analysis
3 agents, 10 evaluations/day
Regression checks

Win-Win

Performance

% of savings

“Fix it, share the upside”

Everything in Free
Unlimited agents & runs
Automated prompt optimization
Documented savings with proof
Model routing recommendations
CI/CD regression gates
You only pay if we save you money

Example: $20K/mo AI spend → we find $8K waste → you pay 20% of savings ($1,600/mo), net save $6,400/mo

Documentation

Our Mission

Getting Started

Register

Get your API key

Submit a contract

Run evaluation

MCP Integration

IDE Configuration

Claude Desktop (Easiest)

Claude Code (config file)

Available MCP Tools

API Reference

Auth

Contracts

Evaluation

Autoresearch

Regression

CLI / cURL Examples

List your contracts

Estimate evaluation cost

Compare models

Run regression check

Start autoresearch

Check usage

IDE Setup

Claude Code / Claude Desktop

Cursor

VS Code (GitHub Copilot)

Windsurf / Codex

Concepts

Agent Interface Contract

Permutation Matrix

Quality Rules

Autoresearch

Regression Gate

Pricing