Code-heavy
Use frontier or code-specialist models on purpose
Do not mix repo agents, autocomplete and product copilots under one lane if the latency and spend profile is different.
This page answers practical questions first: which lane is cheap enough to route, which one is safe for repo work, and which rows still make sense if local hosting matters.
Models compared
5 providers
Open-weight lane
Local or router-friendly
Sub-$1 input
Budget-friendly rows
Long-context options
1M-scale context lanes
Code-heavy
Do not mix repo agents, autocomplete and product copilots under one lane if the latency and spend profile is different.
Budgeted production
Low-cost rows are best when they screen, classify or draft before a more expensive final pass.
Local-first
If privacy, cost ceilings or edge use matter, keep a real self-host lane visible from the start.
LLM technical matrix
A curated vendor-doc snapshot for comparing context, spend, deployment posture and operational fit.
| Model | Context | I/O price | Deployment | Best for | Caution |
|---|---|---|---|---|---|
| GPT-5.4 OpenAI Frontier reasoning Frontier for coding and reasoning Moderate Closed weights | 272k std / 1.05M extended Text + image in | $2.50 Input / 1M $15.00Output / 1M | Managed API / Codex | Large repos, agent tasks and long-context reasoning | Output cost climbs quickly in long sessions Official source: OpenAI pricing |
| GPT-5.4 mini OpenAI High-throughput generalist Balanced for subagents Medium-low Closed weights | 400k Text + image in | $0.75 Input / 1M $4.50Output / 1M | Managed API / Codex | Subagents, pipelines and budgeted automation | Less headroom than the frontier model on complex tasks Official source: OpenAI model note |
| Claude Sonnet 4 Anthropic Code review and planning Strong for review and long plans Moderate Closed weights | 200k base / 1M beta Text + image in | $3.00 Input / 1M $15.00Output / 1M | Claude API / Claude Code | Code review, long docs and memory-heavy orchestration | Long-context mode needs spend controls Official source: Anthropic pricing |
| Claude Haiku 3.5 Anthropic Fast operational lane Fast for triage and drafts Low Closed weights | 200k Text + image in | $0.80 Input / 1M $4.00Output / 1M | Claude API | Classification, internal copilots and low-cost guardrails | Not the strongest final pass for deep reasoning Official source: Anthropic pricing |
| Gemini 2.5 Pro Google Long-context multipurpose Strong on code with huge context Moderate Closed weights | 1,048,576 Text + image + video + audio | $1.25-$2.50 Input / 1M $10.00-$15.00Output / 1M | Gemini API / Vertex | Large repos, heavy docs and multimodal analysis | Pricing steps up beyond 200k input tokens Official source: Gemini pricing |
| Gemini 2.5 Flash-Lite Google Cheap high-volume lane Efficient for throughput Low Closed weights | 1,048,576 Text + image + video + audio | $0.10 Input / 1M $0.40Output / 1M | Gemini API / Vertex | Routing, classification and scale jobs | Should not be the final layer for delicate decisions Official source: Gemini pricing |
| Mistral Large 3 Mistral Enterprise generalist Strong generalist with flexible hosting Moderate Open weight option | 256k Text + image in | $0.50 Input / 1M $1.50Output / 1M | API / private cloud / self-host | Stacks needing a European option and deployment control | The ecosystem is smaller than OpenAI or Anthropic Official source: Mistral docs |
| Codestral Mistral Code specialist Coding specialist Low-medium Closed weights | 256k Code + text | $0.30 Input / 1M $0.90Output / 1M | API / private deploy | Autocomplete, FIM and pure programming tasks | Not the best fit as a product generalist Official source: Mistral docs |
| Ministral 3 8B Mistral Local-first small model Lightweight for edge and small teams Low Open weight option | 256k Text | $0.10 Input / 1M $0.10Output / 1M | Local / edge / API | On-device, edge and low-cost internal assistants | Quality drops sooner than frontier models Official source: Mistral docs |
| DeepSeek V3.2 DeepSeek Cheap open-weight generalist Very efficient for first-pass work Medium-low Open weight option | 128k Text | $0.028 hit / $0.28 miss Input / 1M $0.42Output / 1M | API / self-host / router | Low-cost analysis, routing and drafts before final QA | Enterprise teams should add fallbacks and output controls Official source: DeepSeek pricing |
OpenAI
Frontier reasoning
Large repos, agent tasks and long-context reasoning
Output cost climbs quickly in long sessions
Official sourceOpenAI
High-throughput generalist
Subagents, pipelines and budgeted automation
Less headroom than the frontier model on complex tasks
Official sourceAnthropic
Code review and planning
Code review, long docs and memory-heavy orchestration
Long-context mode needs spend controls
Official sourceAnthropic
Fast operational lane
Classification, internal copilots and low-cost guardrails
Not the strongest final pass for deep reasoning
Official sourceLong-context multipurpose
Large repos, heavy docs and multimodal analysis
Pricing steps up beyond 200k input tokens
Official sourceCheap high-volume lane
Routing, classification and scale jobs
Should not be the final layer for delicate decisions
Official sourceMistral
Enterprise generalist
Stacks needing a European option and deployment control
The ecosystem is smaller than OpenAI or Anthropic
Official sourceMistral
Code specialist
Autocomplete, FIM and pure programming tasks
Not the best fit as a product generalist
Official sourceMistral
Local-first small model
On-device, edge and low-cost internal assistants
Quality drops sooner than frontier models
Official sourceDeepSeek
Cheap open-weight generalist
Low-cost analysis, routing and drafts before final QA
Enterprise teams should add fallbacks and output controls
Official sourceRoute
Start at the routing layer if you still need to decide between vendor, scenario or workflow.
Route
Cut the vendor lane before you over-index on individual rows.
Route
Move from raw specs to scenario-first model picks.
Route
Jump into operating playbooks once the model lane is already narrow enough.