Decision directory

This directory is for choosing a stack with less noise, not for browsing hype.

Start here when the real question is model fit, deployment posture or which route to open next. The point is to cut decisions early and keep only the surfaces that help.

Visual decision flow

Open with a cinematic layer, then cut by matrix and route.

This section keeps model, provider and orchestration decisions in one visual pass before jumping into profiles.

Matrix and provider posture first
Model lane Matrix and provider posture first
Orchestration before framework shopping
Agent lane Orchestration before framework shopping
Recipes, prompts and hardware in sequence
Ops lane Recipes, prompts and hardware in sequence
💡

How to use this directory

Use the matrix first when you are still deciding vendors, then jump into prompts, agents, hardware or tool profiles once the option set is smaller.

Start with the operating question, not the model brand

The fastest route is usually matrix -> route -> profile. That keeps evaluation, prompting and deployment in the right order.

Prompt systems

Profiles live

2

Current tool notes

Matrix rows

10

5 providers

Open-weight lane

3

Router or self-host options

Low-cost input

7

Rows under $1 / 1M input

Reading flow for the directory

  1. 1

    Use case

    Start with the job and the hosting limits

    Decide if the work needs frontier reasoning, high throughput, local privacy or a code specialist.

  2. 2

    Comparison

    Use the matrix to cut the vendor set

    Context, spend and deployment posture narrow the option set faster than raw hype.

  3. 3

    Action

    Then jump into the route that matches the problem

    Go into agents, prompts, hardware or a live profile only after the model lane is clear.

LLM route

Choose the right LLM layer first

Open the LLM route when you still need to decide between matrix, provider compare, model fit or workflow recipes.

Matrix

Compare models before you test

Start with context, deployment and cost posture instead of jumping between vendor ads.

Providers

Compare vendors before you compare rows

Use provider posture, deployment and openness to cut the market before model-level testing.

Model fit

Choose the best model lane by task

Use scenario-first picks when the matrix is too technical and you need a faster recommendation.

Agent board

Decide the stack pattern before the framework

Compare pipelines, memory, browser automation and multi-agent lanes before implementation begins.

Recipes

Move from choosing to operating

Open practical workflow recipes for coding review, retrieval, browser flows and local-first setups.

Agents

Agent frameworks and orchestration

Move into memory, validation and orchestration once the model lane is clear.

Prompts

Prompt systems

Use reusable prompt systems when the problem is workflow quality, not model shopping.

Hardware

Local AI and workstation choices

See NPUs, local rigs and edge setups before buying hardware blindly.

Inference

Inference hardware guide

Choose between API-first, NPUs, single-GPU boxes and private serving nodes with fewer hidden costs.

Open source

Open-weight stacks

Jump into self-host and router-friendly stacks when flexibility matters more than brand.

Matrix first

Use a technical filter before you read opinions

Vendor snapshot

This preview is local and based on vendor docs. It is here to decide faster between context, spend, hosting and operational fit.

Snapshot status

Source mode

Curated local dataset with official vendor links. No external JSON or generated file dependencies.

Snapshot date

2026-03

Primary use

Vendor selection, routing and hosting decisions before implementation work.

Technical comparison

A preview of the local AI matrix before the dedicated surface goes deeper.

LLM technical matrix preview

Use this preview to separate frontier, cheap, local-first and coding-specialist lanes before testing.

Models 6
Open-weight friendly 0
Under $1 input 3
Model Context I/O price Deployment Best for Caution
GPT-5.4 OpenAI

Frontier reasoning

Frontier for coding and reasoning Moderate Closed weights
272k std / 1.05M extended

Text + image in

$2.50

Input / 1M

$15.00

Output / 1M

Managed API / Codex Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source: OpenAI pricing
GPT-5.4 mini OpenAI

High-throughput generalist

Balanced for subagents Medium-low Closed weights
400k

Text + image in

$0.75

Input / 1M

$4.50

Output / 1M

Managed API / Codex Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source: OpenAI model note
Claude Sonnet 4 Anthropic

Code review and planning

Strong for review and long plans Moderate Closed weights
200k base / 1M beta

Text + image in

$3.00

Input / 1M

$15.00

Output / 1M

Claude API / Claude Code Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source: Anthropic pricing
Claude Haiku 3.5 Anthropic

Fast operational lane

Fast for triage and drafts Low Closed weights
200k

Text + image in

$0.80

Input / 1M

$4.00

Output / 1M

Claude API Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source: Anthropic pricing
Gemini 2.5 Pro Google

Long-context multipurpose

Strong on code with huge context Moderate Closed weights
1,048,576

Text + image + video + audio

$1.25-$2.50

Input / 1M

$10.00-$15.00

Output / 1M

Gemini API / Vertex Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source: Gemini pricing
Gemini 2.5 Flash-Lite Google

Cheap high-volume lane

Efficient for throughput Low Closed weights
1,048,576

Text + image + video + audio

$0.10

Input / 1M

$0.40

Output / 1M

Gemini API / Vertex Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source: Gemini pricing

OpenAI

GPT-5.4

Closed

Frontier reasoning

Context
272k std / 1.05M extended
Input
$2.50
Output
$15.00
Deploy
Managed API / Codex

Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source

OpenAI

GPT-5.4 mini

Closed

High-throughput generalist

Context
400k
Input
$0.75
Output
$4.50
Deploy
Managed API / Codex

Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source

Anthropic

Claude Sonnet 4

Closed

Code review and planning

Context
200k base / 1M beta
Input
$3.00
Output
$15.00
Deploy
Claude API / Claude Code

Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source

Anthropic

Claude Haiku 3.5

Closed

Fast operational lane

Context
200k
Input
$0.80
Output
$4.00
Deploy
Claude API

Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source

Google

Gemini 2.5 Pro

Closed

Long-context multipurpose

Context
1,048,576
Input
$1.25-$2.50
Output
$10.00-$15.00
Deploy
Gemini API / Vertex

Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source

Google

Gemini 2.5 Flash-Lite

Closed

Cheap high-volume lane

Context
1,048,576
Input
$0.10
Output
$0.40
Deploy
Gemini API / Vertex

Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source

Inference and deployment

Choose the hosting lane before the vendor debate gets expensive

If the next question is already hardware, budget and bottlenecks, move into the dedicated inference guide instead of staying in this lighter routing layer.

Managed API lane

Best when time-to-market beats hosting control and the team needs vendor tooling, evals and support.

Private inference lane

Use when policy, cost ceilings or data residence force a tighter hosting perimeter.

Edge and on-device lane

Use when latency, offline use or device privacy is more valuable than benchmark leadership.

Open-source stack

Keep open-weight options visible as a real lane, not just a fallback

Open-weight frontier

Use this lane when you still need strong reasoning but want routing or self-host options.

Coding specialists

Separate code-focused models from generalists so autocomplete and repo tasks do not distort stack choices.

Small local operators

Keep a small-model lane for edge, internal helpers and private pilots that should not default to cloud.

Profiles and curated jumps

Keep the catalog honest: a few live notes, plus the routes that matter