LLM matrix

Compare model lanes like an operator, not like a leaderboard fan.

This page answers practical questions first: which lane is cheap enough to route, which one is safe for repo work, and which rows still make sense if local hosting matters.

Models compared

10

5 providers

Open-weight lane

3

Local or router-friendly

Sub-$1 input

7

Budget-friendly rows

Long-context options

4

1M-scale context lanes

Code-heavy

Use frontier or code-specialist models on purpose

Do not mix repo agents, autocomplete and product copilots under one lane if the latency and spend profile is different.

Budgeted production

Cheap input is a routing decision, not a universal winner

Low-cost rows are best when they screen, classify or draft before a more expensive final pass.

Local-first

Open-weight rows matter when hosting is part of the product decision

If privacy, cost ceilings or edge use matter, keep a real self-host lane visible from the start.

Matrix

Current LLM operating lanes

Official docs snapshot

LLM technical matrix

A curated vendor-doc snapshot for comparing context, spend, deployment posture and operational fit.

Models 10
Open-weight friendly 3
Under $1 input 7
Model Context I/O price Deployment Best for Caution
GPT-5.4 OpenAI

Frontier reasoning

Frontier for coding and reasoning Moderate Closed weights
272k std / 1.05M extended

Text + image in

$2.50

Input / 1M

$15.00

Output / 1M

Managed API / Codex Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source: OpenAI pricing
GPT-5.4 mini OpenAI

High-throughput generalist

Balanced for subagents Medium-low Closed weights
400k

Text + image in

$0.75

Input / 1M

$4.50

Output / 1M

Managed API / Codex Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source: OpenAI model note
Claude Sonnet 4 Anthropic

Code review and planning

Strong for review and long plans Moderate Closed weights
200k base / 1M beta

Text + image in

$3.00

Input / 1M

$15.00

Output / 1M

Claude API / Claude Code Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source: Anthropic pricing
Claude Haiku 3.5 Anthropic

Fast operational lane

Fast for triage and drafts Low Closed weights
200k

Text + image in

$0.80

Input / 1M

$4.00

Output / 1M

Claude API Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source: Anthropic pricing
Gemini 2.5 Pro Google

Long-context multipurpose

Strong on code with huge context Moderate Closed weights
1,048,576

Text + image + video + audio

$1.25-$2.50

Input / 1M

$10.00-$15.00

Output / 1M

Gemini API / Vertex Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source: Gemini pricing
Gemini 2.5 Flash-Lite Google

Cheap high-volume lane

Efficient for throughput Low Closed weights
1,048,576

Text + image + video + audio

$0.10

Input / 1M

$0.40

Output / 1M

Gemini API / Vertex Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source: Gemini pricing
Mistral Large 3 Mistral

Enterprise generalist

Strong generalist with flexible hosting Moderate Open weight option
256k

Text + image in

$0.50

Input / 1M

$1.50

Output / 1M

API / private cloud / self-host Stacks needing a European option and deployment control

The ecosystem is smaller than OpenAI or Anthropic

Official source: Mistral docs
Codestral Mistral

Code specialist

Coding specialist Low-medium Closed weights
256k

Code + text

$0.30

Input / 1M

$0.90

Output / 1M

API / private deploy Autocomplete, FIM and pure programming tasks

Not the best fit as a product generalist

Official source: Mistral docs
Ministral 3 8B Mistral

Local-first small model

Lightweight for edge and small teams Low Open weight option
256k

Text

$0.10

Input / 1M

$0.10

Output / 1M

Local / edge / API On-device, edge and low-cost internal assistants

Quality drops sooner than frontier models

Official source: Mistral docs
DeepSeek V3.2 DeepSeek

Cheap open-weight generalist

Very efficient for first-pass work Medium-low Open weight option
128k

Text

$0.028 hit / $0.28 miss

Input / 1M

$0.42

Output / 1M

API / self-host / router Low-cost analysis, routing and drafts before final QA

Enterprise teams should add fallbacks and output controls

Official source: DeepSeek pricing

OpenAI

GPT-5.4

Closed

Frontier reasoning

Context
272k std / 1.05M extended
Input
$2.50
Output
$15.00
Deploy
Managed API / Codex

Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source

OpenAI

GPT-5.4 mini

Closed

High-throughput generalist

Context
400k
Input
$0.75
Output
$4.50
Deploy
Managed API / Codex

Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source

Anthropic

Claude Sonnet 4

Closed

Code review and planning

Context
200k base / 1M beta
Input
$3.00
Output
$15.00
Deploy
Claude API / Claude Code

Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source

Anthropic

Claude Haiku 3.5

Closed

Fast operational lane

Context
200k
Input
$0.80
Output
$4.00
Deploy
Claude API

Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source

Google

Gemini 2.5 Pro

Closed

Long-context multipurpose

Context
1,048,576
Input
$1.25-$2.50
Output
$10.00-$15.00
Deploy
Gemini API / Vertex

Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source

Google

Gemini 2.5 Flash-Lite

Closed

Cheap high-volume lane

Context
1,048,576
Input
$0.10
Output
$0.40
Deploy
Gemini API / Vertex

Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source

Mistral

Mistral Large 3

Open weight

Enterprise generalist

Context
256k
Input
$0.50
Output
$1.50
Deploy
API / private cloud / self-host

Stacks needing a European option and deployment control

The ecosystem is smaller than OpenAI or Anthropic

Official source

Mistral

Codestral

Closed

Code specialist

Context
256k
Input
$0.30
Output
$0.90
Deploy
API / private deploy

Autocomplete, FIM and pure programming tasks

Not the best fit as a product generalist

Official source

Mistral

Ministral 3 8B

Open weight

Local-first small model

Context
256k
Input
$0.10
Output
$0.10
Deploy
Local / edge / API

On-device, edge and low-cost internal assistants

Quality drops sooner than frontier models

Official source

DeepSeek

DeepSeek V3.2

Open weight

Cheap open-weight generalist

Context
128k
Input
$0.028 hit / $0.28 miss
Output
$0.42
Deploy
API / self-host / router

Low-cost analysis, routing and drafts before final QA

Enterprise teams should add fallbacks and output controls

Official source

Route

LLM route

Start at the routing layer if you still need to decide between vendor, scenario or workflow.

Route

Provider compare

Cut the vendor lane before you over-index on individual rows.

Route

Model fit radar

Move from raw specs to scenario-first model picks.

Route

Workflow recipes

Jump into operating playbooks once the model lane is already narrow enough.