HoneycombOS

Documentation

Financial guardrails for AI agents. Find hidden cost waste, prove reliability, and optimize margins.

Our Mission

Enterprises are bleeding cash on unpredictable agentic workflows. HoneycombOS is the financial guardrail that caps token waste, proves agent reliability, and optimizes cost-per-outcome — not cost-per-token. We show you what you're wasting for free. We fix it and share the savings.

Get up and running in under 5 minutes. Register, grab your API key, and run your first evaluation.

1

Register

bash
curl -X POST https://api.honeycombos.ai/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com", "name": "Your Company"}'
2

Get your API key

Your API key is returned in the registration response. It starts with hcos_.

text
# Your key: hcos_abc123...
3

Submit a contract

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/contracts \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "your-tenant-id",
    "contract": {
      "agent_id": "my-agent",
      "agent_name": "My Agent",
      "endpoint": "https://my-app.com/api/agent",
      "inputs": [
        {"name": "intent", "type": "enum", "values": ["question", "complaint", "request"]},
        {"name": "urgency", "type": "enum", "values": ["low", "medium", "high"]}
      ],
      "quality_rules": [
        {
          "name": "high_urgency_flagged",
          "rule": "if urgency == high then requires_review == true",
          "severity": "critical"
        }
      ]
    }
  }'
4

Run evaluation

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/run \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"contract_id": "returned-contract-id"}'

The Model Context Protocol (MCP) is the primary integration method. Connect your IDE to HoneycombOS in one config block.

IDE Configuration

Claude Desktop (Easiest)

Go to Settings → MCP Servers → Add Custom MCP:

Name:HoneycombOS
URL:https://honeycombos-agenteval-production.up.railway.app/mcp/sse
OAuth:Leave blank

That's it. 10 HoneycombOS tools will appear in your conversations. Say “evaluate my agent” to get started.

Claude Code (config file)

Add to ~/.claude/settings.json or project .claude/settings.json

json
{
  "mcpServers": {
    "honeycombos": {
      "url": "https://honeycombos-agenteval-production.up.railway.app/mcp/sse"
    }
  }
}

Available MCP Tools

Free Tools
ToolDescription
honeycombos_register_agentRegister an agent endpoint for evaluation
honeycombos_define_contractAuto-generate evaluation contract from your code
honeycombos_evaluateRun permutation matrix against your agent
honeycombos_get_resultsGet evaluation results and failure report
honeycombos_compare_modelsCompare agent quality across models
honeycombos_check_regressionCheck if changes caused quality regression
honeycombos_estimate_costEstimate evaluation cost before running
Pro Tools
ToolDescription
honeycombos_optimize_promptAutomated prompt optimization (autoresearch)
honeycombos_get_optimization_statusCheck optimization progress
honeycombos_get_recommendationsGet actionable improvement recommendations

Base URL: https://api.honeycombos.ai. All requests require an Authorization: Bearer hcos_... header (except registration).

Auth

POST/auth/register
Create a new account
POST/auth/api-key
Generate a new API key
GET/auth/usage
Get current usage stats
GET/auth/verify-key
Verify an API key is valid

Contracts

POST/v1/evaluation/contracts
Create a new evaluation contract
GET/v1/evaluation/contracts
List all contracts
GET/v1/evaluation/contracts/{id}
Get contract by ID

Evaluation

POST/v1/evaluation/run
Start an evaluation run
GET/v1/evaluation/runs
List all evaluation runs
GET/v1/evaluation/runs/{id}
Get run details
GET/v1/evaluation/runs/{id}/report
Get full evaluation report
POST/v1/evaluation/compare
Compare multiple agent configurations
POST/v1/evaluation/estimate
Estimate cost before running

Autoresearch

POST/v1/evaluation/autoresearch
Start automated prompt optimization
GET/v1/evaluation/autoresearch/{id}
Get optimization status
POST/v1/evaluation/autoresearch/{id}/stop
Stop an optimization run

Regression

POST/v1/evaluation/regression/check
Run regression check vs baseline
POST/v1/evaluation/regression/webhook
Configure regression webhook

Common operations via cURL. Replace hcos_abc123... with your API key.

List your contracts

bash
curl -s https://api.honeycombos.ai/v1/evaluation/contracts \
  -H "Authorization: Bearer hcos_abc123..." | jq

Estimate evaluation cost

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/estimate \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"contract_id": "your-contract-id"}'

Compare models

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/compare \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"]
  }'

Run regression check

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/regression/check \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "baseline_run_id": "previous-run-id"
  }'

Start autoresearch

bash
curl -X POST https://api.honeycombos.ai/v1/evaluation/autoresearch \
  -H "Authorization: Bearer hcos_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "contract_id": "your-contract-id",
    "max_iterations": 20
  }'

Check usage

bash
curl -s https://api.honeycombos.ai/auth/usage \
  -H "Authorization: Bearer hcos_abc123..." | jq

Detailed setup instructions for each supported IDE. After configuration, the HoneycombOS tools will appear in your AI assistant's tool list.

Claude Code / Claude Desktop

  1. Open ~/.claude/settings.json (global) or .claude/settings.json (per project)
  2. Add the MCP server config (see MCP Integration above)
  3. Restart Claude Code or reload the window
  4. Type “evaluate my agent” to verify the tools are available

Cursor

  1. Create .cursor/mcp.json in your project root
  2. Add the MCP server config
  3. Restart Cursor
  4. Tools appear in the Composer panel

VS Code (GitHub Copilot)

  1. Create .vscode/mcp.json in your project root
  2. Add the server config (note: uses servers key, not mcpServers)
  3. Reload VS Code window
  4. Available via Copilot Chat in Agent mode

Windsurf / Codex

  1. Open ~/.codeium/windsurf/mcp_config.json
  2. Add the MCP server config
  3. Restart Windsurf
  4. For Codex, add the same config to your Codex settings

Core concepts behind HoneycombOS evaluation and optimization.

Agent Interface Contract

A YAML/JSON schema describing your agent's inputs, outputs, and quality rules. HoneycombOS generates the evaluation test suite from this. Think of it as a test spec for your AI agent.

Permutation Matrix

Every combination of input values tested exhaustively. Not sampling — full enumeration. A 5-axis agent with 4 values each = 1,024 test cases. This is how you find the edge cases sampling misses.

Quality Rules

Assertions about what your agent must do. “If urgency is high, must flag for review.” Rules have severity levels: critical, warning, and info. Critical failures block deploys.

Autoresearch

Automated prompt optimization. Analyzes failures, hypothesizes fixes, re-runs the matrix, keeps improvements. Typically converges in 10–20 iterations. Available on Pro plans.

Regression Gate

A 6-layer check comparing current vs baseline evaluation. Blocks deploys if quality drops. Can be integrated into CI/CD via webhooks. Catches regressions before they reach production.


We only get paid when you save money. Free to audit your AI spend. Performance-based pricing on optimization.

Free Audit
$0

“Show me the waste”

  • Full cost analysis & CFO Readout
  • Reliability report (pass/fail per agent)
  • Model comparison (cost vs quality)
  • Token burn rate analysis
  • 3 agents, 10 evaluations/day
  • Regression checks
Win-Win
Performance
% of savings

“Fix it, share the upside”

  • Everything in Free
  • Unlimited agents & runs
  • Automated prompt optimization
  • Documented savings with proof
  • Model routing recommendations
  • CI/CD regression gates
  • You only pay if we save you money

Example: $20K/mo AI spend → we find $8K waste → you pay 20% of savings ($1,600/mo), net save $6,400/mo