This directory is for choosing a stack with less noise, not for browsing hype.
Start here when the real question is model fit, deployment posture or which route to open next. The point is to cut decisions early and keep only the surfaces that help.
Visual decision flow
Open with a cinematic layer, then cut by matrix and route.
This section keeps model, provider and orchestration decisions in one visual pass before jumping into profiles.
How to use this directory
Profiles live
Current tool notes
Matrix rows
5 providers
Open-weight lane
Router or self-host options
Low-cost input
Rows under $1 / 1M input
Reading flow for the directory
- 1
Use case
Start with the job and the hosting limits
Decide if the work needs frontier reasoning, high throughput, local privacy or a code specialist.
- 2
Comparison
Use the matrix to cut the vendor set
Context, spend and deployment posture narrow the option set faster than raw hype.
- 3
Action
Then jump into the route that matches the problem
Go into agents, prompts, hardware or a live profile only after the model lane is clear.
LLM route
Choose the right LLM layer first
Open the LLM route when you still need to decide between matrix, provider compare, model fit or workflow recipes.
Matrix
Compare models before you test
Start with context, deployment and cost posture instead of jumping between vendor ads.
Providers
Compare vendors before you compare rows
Use provider posture, deployment and openness to cut the market before model-level testing.
Model fit
Choose the best model lane by task
Use scenario-first picks when the matrix is too technical and you need a faster recommendation.
Agent board
Decide the stack pattern before the framework
Compare pipelines, memory, browser automation and multi-agent lanes before implementation begins.
Recipes
Move from choosing to operating
Open practical workflow recipes for coding review, retrieval, browser flows and local-first setups.
Agents
Agent frameworks and orchestration
Move into memory, validation and orchestration once the model lane is clear.
Prompts
Prompt systems
Use reusable prompt systems when the problem is workflow quality, not model shopping.
Hardware
Local AI and workstation choices
See NPUs, local rigs and edge setups before buying hardware blindly.
Inference
Inference hardware guide
Choose between API-first, NPUs, single-GPU boxes and private serving nodes with fewer hidden costs.
Open source
Open-weight stacks
Jump into self-host and router-friendly stacks when flexibility matters more than brand.
Use a technical filter before you read opinions
This preview is local and based on vendor docs. It is here to decide faster between context, spend, hosting and operational fit.
Curated local dataset with official vendor links. No external JSON or generated file dependencies.
2026-03
Vendor selection, routing and hosting decisions before implementation work.
Technical comparison
A preview of the local AI matrix before the dedicated surface goes deeper.
LLM technical matrix preview
Use this preview to separate frontier, cheap, local-first and coding-specialist lanes before testing.
| Model | Context | I/O price | Deployment | Best for | Caution |
|---|---|---|---|---|---|
| GPT-5.4 OpenAI Frontier reasoning Frontier for coding and reasoning Moderate Closed weights | 272k std / 1.05M extended Text + image in | $2.50 Input / 1M $15.00Output / 1M | Managed API / Codex | Large repos, agent tasks and long-context reasoning | Output cost climbs quickly in long sessions Official source: OpenAI pricing |
| GPT-5.4 mini OpenAI High-throughput generalist Balanced for subagents Medium-low Closed weights | 400k Text + image in | $0.75 Input / 1M $4.50Output / 1M | Managed API / Codex | Subagents, pipelines and budgeted automation | Less headroom than the frontier model on complex tasks Official source: OpenAI model note |
| Claude Sonnet 4 Anthropic Code review and planning Strong for review and long plans Moderate Closed weights | 200k base / 1M beta Text + image in | $3.00 Input / 1M $15.00Output / 1M | Claude API / Claude Code | Code review, long docs and memory-heavy orchestration | Long-context mode needs spend controls Official source: Anthropic pricing |
| Claude Haiku 3.5 Anthropic Fast operational lane Fast for triage and drafts Low Closed weights | 200k Text + image in | $0.80 Input / 1M $4.00Output / 1M | Claude API | Classification, internal copilots and low-cost guardrails | Not the strongest final pass for deep reasoning Official source: Anthropic pricing |
| Gemini 2.5 Pro Google Long-context multipurpose Strong on code with huge context Moderate Closed weights | 1,048,576 Text + image + video + audio | $1.25-$2.50 Input / 1M $10.00-$15.00Output / 1M | Gemini API / Vertex | Large repos, heavy docs and multimodal analysis | Pricing steps up beyond 200k input tokens Official source: Gemini pricing |
| Gemini 2.5 Flash-Lite Google Cheap high-volume lane Efficient for throughput Low Closed weights | 1,048,576 Text + image + video + audio | $0.10 Input / 1M $0.40Output / 1M | Gemini API / Vertex | Routing, classification and scale jobs | Should not be the final layer for delicate decisions Official source: Gemini pricing |
OpenAI
GPT-5.4
Frontier reasoning
- Context
- 272k std / 1.05M extended
- Input
- $2.50
- Output
- $15.00
- Deploy
- Managed API / Codex
Large repos, agent tasks and long-context reasoning
Output cost climbs quickly in long sessions
Official sourceOpenAI
GPT-5.4 mini
High-throughput generalist
- Context
- 400k
- Input
- $0.75
- Output
- $4.50
- Deploy
- Managed API / Codex
Subagents, pipelines and budgeted automation
Less headroom than the frontier model on complex tasks
Official sourceAnthropic
Claude Sonnet 4
Code review and planning
- Context
- 200k base / 1M beta
- Input
- $3.00
- Output
- $15.00
- Deploy
- Claude API / Claude Code
Code review, long docs and memory-heavy orchestration
Long-context mode needs spend controls
Official sourceAnthropic
Claude Haiku 3.5
Fast operational lane
- Context
- 200k
- Input
- $0.80
- Output
- $4.00
- Deploy
- Claude API
Classification, internal copilots and low-cost guardrails
Not the strongest final pass for deep reasoning
Official sourceGemini 2.5 Pro
Long-context multipurpose
- Context
- 1,048,576
- Input
- $1.25-$2.50
- Output
- $10.00-$15.00
- Deploy
- Gemini API / Vertex
Large repos, heavy docs and multimodal analysis
Pricing steps up beyond 200k input tokens
Official sourceGemini 2.5 Flash-Lite
Cheap high-volume lane
- Context
- 1,048,576
- Input
- $0.10
- Output
- $0.40
- Deploy
- Gemini API / Vertex
Routing, classification and scale jobs
Should not be the final layer for delicate decisions
Official sourceInference and deployment
Choose the hosting lane before the vendor debate gets expensive
If the next question is already hardware, budget and bottlenecks, move into the dedicated inference guide instead of staying in this lighter routing layer.
Managed API lane
Best when time-to-market beats hosting control and the team needs vendor tooling, evals and support.
Private inference lane
Use when policy, cost ceilings or data residence force a tighter hosting perimeter.
Edge and on-device lane
Use when latency, offline use or device privacy is more valuable than benchmark leadership.
Open-source stack
Keep open-weight options visible as a real lane, not just a fallback
Open-weight frontier
Use this lane when you still need strong reasoning but want routing or self-host options.
Coding specialists
Separate code-focused models from generalists so autocomplete and repo tasks do not distort stack choices.
Small local operators
Keep a small-model lane for edge, internal helpers and private pilots that should not default to cloud.
Search in current profiles and routes
Search by topic or category.
8 visible results
Profiles and curated jumps
Keep the catalog honest: a few live notes, plus the routes that matter
Cursor: IDE agentico para equipos de producto
Guia practica para usar Cursor en equipos que necesitan velocidad sin perder control de calidad en codigo y arquitectura.
DeepSeek-V3: razonamiento abierto en produccion
Analisis operativo de DeepSeek-V3 para equipos que quieren bajar coste por token sin perder calidad en tareas de razonamiento.
LLM technical comparison
Use the matrix preview when you need context and cost before opinion.
Provider compare
Compare vendors by deployment posture, openness and modality before narrowing models.
Model fit radar
Use task-first picks when the next question is which model to start with.
Agent stack board
Compare pipelines, memory, browser and multi-agent patterns before framework shopping.
Workflow recipes
Run practical operating patterns once model and provider choice are already narrowed.
Inference hardware guide
Decide between API-first, retrieval nodes, workstations and private serving before buying hardware.