Decision directory

This directory is for choosing a stack with less noise, not for browsing hype.

Start here when the real question is model fit, deployment posture or which route to open next. The point is to cut decisions early and keep only the surfaces that help.

Visual decision flow

Open with a cinematic layer, then cut by matrix and route.

This section keeps model, provider and orchestration decisions in one visual pass before jumping into profiles.

Open matrix Open LLM route

Model lane Matrix and provider posture first

Agent lane Orchestration before framework shopping

Ops lane Recipes, prompts and hardware in sequence

💡

How to use this directory

Use the matrix first when you are still deciding vendors, then jump into prompts, agents, hardware or tool profiles once the option set is smaller.

Profiles live

Current tool notes

Matrix rows

5 providers

Open-weight lane

Router or self-host options

Low-cost input

Rows under $1 / 1M input

Reading flow for the directory

1

Use case

Start with the job and the hosting limits

Decide if the work needs frontier reasoning, high throughput, local privacy or a code specialist.
2

Comparison

Use the matrix to cut the vendor set

Context, spend and deployment posture narrow the option set faster than raw hype.
3

Action

Then jump into the route that matches the problem

Go into agents, prompts, hardware or a live profile only after the model lane is clear.

LLM route

Choose the right LLM layer first

Open the LLM route when you still need to decide between matrix, provider compare, model fit or workflow recipes.

Open LLM route

Matrix

Compare models before you test

Start with context, deployment and cost posture instead of jumping between vendor ads.

Open matrix

Providers

Compare vendors before you compare rows

Use provider posture, deployment and openness to cut the market before model-level testing.

Open providers

Model fit

Choose the best model lane by task

Use scenario-first picks when the matrix is too technical and you need a faster recommendation.

Open radar

Agent board

Decide the stack pattern before the framework

Compare pipelines, memory, browser automation and multi-agent lanes before implementation begins.

Open board

Recipes

Move from choosing to operating

Open practical workflow recipes for coding review, retrieval, browser flows and local-first setups.

Open recipes

Agents

Agent frameworks and orchestration

Move into memory, validation and orchestration once the model lane is clear.

Open agent route

Prompts

Prompt systems

Use reusable prompt systems when the problem is workflow quality, not model shopping.

Open prompt desk

Hardware

Local AI and workstation choices

See NPUs, local rigs and edge setups before buying hardware blindly.

Open hardware route

Inference

Inference hardware guide

Choose between API-first, NPUs, single-GPU boxes and private serving nodes with fewer hidden costs.

Open guide

Open source

Open-weight stacks

Jump into self-host and router-friendly stacks when flexibility matters more than brand.

Open open-source lane

Matrix first

Use a technical filter before you read opinions

Vendor snapshot

This preview is local and based on vendor docs. It is here to decide faster between context, spend, hosting and operational fit.

LLM route Open full matrix Provider compare Model fit radar Agent stack board Workflow recipes Inference guide Agent frameworks Hardware route

Snapshot status

Source mode

Curated local dataset with official vendor links. No external JSON or generated file dependencies.

Snapshot date

2026-03

Primary use

Vendor selection, routing and hosting decisions before implementation work.

Technical comparison

A preview of the local AI matrix before the dedicated surface goes deeper.

LLM technical matrix preview

Use this preview to separate frontier, cheap, local-first and coding-specialist lanes before testing.

Models 6

Open-weight friendly 0

Under $1 input 3

Model	Context	I/O price	Deployment	Best for	Caution
GPT-5.4 OpenAI Frontier reasoning Frontier for coding and reasoning Moderate Closed weights	272k std / 1.05M extended Text + image in	$2.50 Input / 1M $15.00 Output / 1M	Managed API / Codex	Large repos, agent tasks and long-context reasoning	Output cost climbs quickly in long sessions Official source: OpenAI pricing
GPT-5.4 mini OpenAI High-throughput generalist Balanced for subagents Medium-low Closed weights	400k Text + image in	$0.75 Input / 1M $4.50 Output / 1M	Managed API / Codex	Subagents, pipelines and budgeted automation	Less headroom than the frontier model on complex tasks Official source: OpenAI model note
Claude Sonnet 4 Anthropic Code review and planning Strong for review and long plans Moderate Closed weights	200k base / 1M beta Text + image in	$3.00 Input / 1M $15.00 Output / 1M	Claude API / Claude Code	Code review, long docs and memory-heavy orchestration	Long-context mode needs spend controls Official source: Anthropic pricing
Claude Haiku 3.5 Anthropic Fast operational lane Fast for triage and drafts Low Closed weights	200k Text + image in	$0.80 Input / 1M $4.00 Output / 1M	Claude API	Classification, internal copilots and low-cost guardrails	Not the strongest final pass for deep reasoning Official source: Anthropic pricing
Gemini 2.5 Pro Google Long-context multipurpose Strong on code with huge context Moderate Closed weights	1,048,576 Text + image + video + audio	$1.25-$2.50 Input / 1M $10.00-$15.00 Output / 1M	Gemini API / Vertex	Large repos, heavy docs and multimodal analysis	Pricing steps up beyond 200k input tokens Official source: Gemini pricing
Gemini 2.5 Flash-Lite Google Cheap high-volume lane Efficient for throughput Low Closed weights	1,048,576 Text + image + video + audio	$0.10 Input / 1M $0.40 Output / 1M	Gemini API / Vertex	Routing, classification and scale jobs	Should not be the final layer for delicate decisions Official source: Gemini pricing

OpenAI

GPT-5.4

Closed

Frontier reasoning

Context: 272k std / 1.05M extended
Input: $2.50
Output: $15.00
Deploy: Managed API / Codex

Large repos, agent tasks and long-context reasoning

Output cost climbs quickly in long sessions

Official source

OpenAI

GPT-5.4 mini

Closed

High-throughput generalist

Context: 400k
Input: $0.75
Output: $4.50
Deploy: Managed API / Codex

Subagents, pipelines and budgeted automation

Less headroom than the frontier model on complex tasks

Official source

Anthropic

Claude Sonnet 4

Closed

Code review and planning

Context: 200k base / 1M beta
Input: $3.00
Output: $15.00
Deploy: Claude API / Claude Code

Code review, long docs and memory-heavy orchestration

Long-context mode needs spend controls

Official source

Anthropic

Claude Haiku 3.5

Closed

Fast operational lane

Context: 200k
Input: $0.80
Output: $4.00
Deploy: Claude API

Classification, internal copilots and low-cost guardrails

Not the strongest final pass for deep reasoning

Official source

Google

Gemini 2.5 Pro

Closed

Long-context multipurpose

Context: 1,048,576
Input: $1.25-$2.50
Output: $10.00-$15.00
Deploy: Gemini API / Vertex

Large repos, heavy docs and multimodal analysis

Pricing steps up beyond 200k input tokens

Official source

Google

Gemini 2.5 Flash-Lite

Closed

Cheap high-volume lane

Context: 1,048,576
Input: $0.10
Output: $0.40
Deploy: Gemini API / Vertex

Routing, classification and scale jobs

Should not be the final layer for delicate decisions

Official source

Inference and deployment

Choose the hosting lane before the vendor debate gets expensive

If the next question is already hardware, budget and bottlenecks, move into the dedicated inference guide instead of staying in this lighter routing layer.

Open inference hardware guide