Benchmark lab

Benchmark pages should help you choose a stack, not just admire charts.

Every comparison in this section is framed around operational decisions: local vs API, risk vs speed and whether the workflow still holds under real team usage.

Benchmarks

1 ↑

Published

Methodology

Fixed •

Latency, cost, stability

Update rhythm

Weekly ↑

On active stacks

Bias control

Human ↑

Editorial verification

Benchmark rubric

Layer	Metric	Why it matters
Latency	TTFT and p95	Determines whether an AI workflow can feel operationally useful
Cost	Per run and per million tokens	Prevents demo economics from leaking into production
Stability	Error rate and retry pressure	Shows what breaks when real load arrives
Governance	Privacy and routing	Defines if the stack is viable for sensitive work

Move from the scorecard into the next decision desk

Benchmarks should stay connected to the directory, prompt library and archive so the decision does not stop at the chart.

Directory Prompt systems Archive

llmlatencia

Benchmark: LLM local vs API en latencia real

Comparativa tecnica de latencia, coste y estabilidad entre inferencia local y API en equipos de producto.