Benchmark lab

Benchmark pages should help you choose a stack, not just admire charts.

Every comparison in this section is framed around operational decisions: local vs API, risk vs speed and whether the workflow still holds under real team usage.

Benchmarks

1

Published

Methodology

Fixed

Latency, cost, stability

Update rhythm

Weekly

On active stacks

Bias control

Human

Editorial verification

Benchmark rubric

Layer Metric Why it matters
Latency TTFT and p95 Determines whether an AI workflow can feel operationally useful
Cost Per run and per million tokens Prevents demo economics from leaking into production
Stability Error rate and retry pressure Shows what breaks when real load arrives
Governance Privacy and routing Defines if the stack is viable for sensitive work

Move from the scorecard into the next decision desk

Benchmarks should stay connected to the directory, prompt library and archive so the decision does not stop at the chart.

llmlatencia

Benchmark: LLM local vs API en latencia real

Comparativa tecnica de latencia, coste y estabilidad entre inferencia local y API en equipos de producto.