Automix Router

The Automix router optimizes for cost by starting every query with a cheap model and only escalating to a premium model when the initial response's confidence is below a threshold.

How It Works

Initial query -- send the query to the cheap model
Confidence check -- evaluate the response confidence score
Escalate if needed -- if confidence is below threshold, re-query with premium model
Return -- return the first confident response

Confidence Scoring

Confidence is assessed based on:

Self-reported confidence in the response
Presence of hedging language ("I'm not sure", "might be")
Token-level entropy of the response
Tool call success rate

Configuration

toml

[router]
strategy = "automix"

[router.automix]
enabled = true
confidence_threshold = 0.7
cheap_model = "anthropic/claude-haiku"
premium_model = "anthropic/claude-opus-4-6"
max_escalations = 1

Cost Savings

In typical usage, Automix routes 60-80% of queries to the cheap model, achieving significant cost savings while maintaining quality for complex queries.

Automix Router ​

How It Works ​

Confidence Scoring ​

Configuration ​

Cost Savings ​

Related Pages ​

Automix Router

How It Works

Confidence Scoring

Configuration

Cost Savings

Related Pages