LLM matrix

Compare model lanes like an operator, not like a leaderboard fan.

This page answers practical questions first: which lane is cheap enough to route, which one is safe for repo work, and which rows still make sense if local hosting matters.

Open provider compare Open recipes

Models compared

5 providers

Open-weight lane

Local or router-friendly

Sub-$1 input

Budget-friendly rows

Long-context options

1M-scale context lanes

Code-heavy

Use frontier or code-specialist models on purpose

Do not mix repo agents, autocomplete and product copilots under one lane if the latency and spend profile is different.

Budgeted production

Cheap input is a routing decision, not a universal winner

Low-cost rows are best when they screen, classify or draft before a more expensive final pass.

Local-first

Open-weight rows matter when hosting is part of the product decision

If privacy, cost ceilings or edge use matter, keep a real self-host lane visible from the start.

Matrix

Current LLM operating lanes

Official docs snapshot

LLM technical matrix

A curated vendor-doc snapshot for comparing context, spend, deployment posture and operational fit.

Models 10

Open-weight friendly 3

Under $1 input 7

Model	Context	I/O price	Deployment	Best for	Caution
GPT-5.4 OpenAI Frontier reasoning Frontier for coding and reasoning Moderate Closed weights	272k std / 1.05M extended Text + image in	$2.50 Input / 1M $15.00 Output / 1M	Managed API / Codex	Large repos, agent tasks and long-context reasoning	Output cost climbs quickly in long sessions Official source: OpenAI pricing
GPT-5.4 mini OpenAI High-throughput generalist Balanced for subagents Medium-low Closed weights	400k Text + image in	$0.75 Input / 1M $4.50 Output / 1M	Managed API / Codex	Subagents, pipelines and budgeted automation	Less headroom than the frontier model on complex tasks Official source: OpenAI model note
Claude Sonnet 4 Anthropic Code review and planning Strong for review and long plans Moderate Closed weights	200k base / 1M beta Text + image in	$3.00 Input / 1M $15.00 Output / 1M	Claude API / Claude Code	Code review, long docs and memory-heavy orchestration	Long-context mode needs spend controls Official source: Anthropic pricing
Claude Haiku 3.5 Anthropic Fast operational lane Fast for triage and drafts Low Closed weights	200k Text + image in	$0.80 Input / 1M $4.00 Output / 1M	Claude API	Classification, internal copilots and low-cost guardrails	Not the strongest final pass for deep reasoning Official source: Anthropic pricing
Gemini 2.5 Pro Google Long-context multipurpose Strong on code with huge context Moderate Closed weights	1,048,576 Text + image + video + audio	$1.25-$2.50 Input / 1M $10.00-$15.00 Output / 1M	Gemini API / Vertex	Large repos, heavy docs and multimodal analysis	Pricing steps up beyond 200k input tokens Official source: Gemini pricing
Gemini 2.5 Flash-Lite Google Cheap high-volume lane Efficient for throughput Low Closed weights	1,048,576 Text + image + video + audio	$0.10 Input / 1M $0.40 Output / 1M	Gemini API / Vertex	Routing, classification and scale jobs	Should not be the final layer for delicate decisions Official source: Gemini pricing
Mistral Large 3 Mistral Enterprise generalist Strong generalist with flexible hosting Moderate Open weight option	256k Text + image in	$0.50 Input / 1M $1.50 Output / 1M	API / private cloud / self-host	Stacks needing a European option and deployment control	The ecosystem is smaller than OpenAI or Anthropic Official source: Mistral docs
Codestral Mistral Code specialist Coding specialist Low-medium Closed weights	256k Code + text	$0.30 Input / 1M $0.90 Output / 1M	API / private deploy	Autocomplete, FIM and pure programming tasks	Not the best fit as a product generalist Official source: Mistral docs
Ministral 3 8B Mistral Local-first small model Lightweight for edge and small teams Low Open weight option	256k Text	$0.10 Input / 1M $0.10 Output / 1M	Local / edge / API	On-device, edge and low-cost internal assistants	Quality drops sooner than frontier models Official source: Mistral docs
DeepSeek V3.2 DeepSeek Cheap open-weight generalist Very efficient for first-pass work Medium-low Open weight option	128k Text	$0.028 hit / $0.28 miss Input / 1M $0.42 Output / 1M	API / self-host / router	Low-cost analysis, routing and drafts before final QA	Enterprise teams should add fallbacks and output controls Official source: DeepSeek pricing

OpenAI

GPT-5.4

Closed

Frontier reasoning

Context: 272k std / 1.05M extended
Input: $2.50
Output: $15.00
Deploy: Managed API / Codex

Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source

OpenAI

GPT-5.4 mini

Closed

High-throughput generalist

Context: 400k
Input: $0.75
Output: $4.50
Deploy: Managed API / Codex

Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source

Anthropic

Claude Sonnet 4

Closed

Code review and planning

Context: 200k base / 1M beta
Input: $3.00
Output: $15.00
Deploy: Claude API / Claude Code

Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source

Anthropic

Claude Haiku 3.5

Closed

Fast operational lane

Context: 200k
Input: $0.80
Output: $4.00
Deploy: Claude API

Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source

Google

Gemini 2.5 Pro

Closed

Long-context multipurpose

Context: 1,048,576
Input: $1.25-$2.50
Output: $10.00-$15.00
Deploy: Gemini API / Vertex

Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source

Google

Gemini 2.5 Flash-Lite

Closed

Cheap high-volume lane

Context: 1,048,576
Input: $0.10
Output: $0.40
Deploy: Gemini API / Vertex

Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source

Mistral

Mistral Large 3

Open weight

Enterprise generalist

Context: 256k
Input: $0.50
Output: $1.50
Deploy: API / private cloud / self-host

Stacks needing a European option and deployment control

The ecosystem is smaller than OpenAI or Anthropic

Official source

Mistral

Codestral

Closed

Code specialist

Context: 256k
Input: $0.30
Output: $0.90
Deploy: API / private deploy

Autocomplete, FIM and pure programming tasks

Not the best fit as a product generalist

Official source

Mistral

Ministral 3 8B

Open weight

Local-first small model

Context: 256k
Input: $0.10
Output: $0.10
Deploy: Local / edge / API

On-device, edge and low-cost internal assistants

Quality drops sooner than frontier models

Official source

DeepSeek

DeepSeek V3.2

Open weight

Cheap open-weight generalist

Context: 128k
Input: $0.028 hit / $0.28 miss
Output: $0.42
Deploy: API / self-host / router

Low-cost analysis, routing and drafts before final QA

Enterprise teams should add fallbacks and output controls

Official source

Route

LLM route

Start at the routing layer if you still need to decide between vendor, scenario or workflow.

Open LLM route

Route

Provider compare

Cut the vendor lane before you over-index on individual rows.

Open providers

Route

Model fit radar

Move from raw specs to scenario-first model picks.

Open radar

Route

Workflow recipes

Jump into operating playbooks once the model lane is already narrow enough.

Open recipes