Hardware route

Use this route to orient hardware decisions before you turn them into capex.

Most teams do not need a rack on day one. They need a fast read on privacy, daily demand and the bottleneck that will actually hurt first.

Use the dedicated inference guide when the next question is budget, bottleneck or API-vs-local.

This route stays intentionally light. The deeper layer is where budget bands, quiet desks, retrieval nodes and private serving decisions become explicit.

Inference hardware guide
💡

Boundary

Treat this page as the hardware routing layer. Use it to orient the conversation, then jump into the dedicated inference guide for budget bands, bottlenecks and real buy-vs-API calls.

Reference lanes

5 ↑

API-first to private node

Quick read

3 •

budget, bottleneck, privacy

VRAM bands

3 ↑

NPU / 16-24GB / 48-80GB

Shared serving

1 •

Only when demand is stable

Selection logic

Start from privacy, workload shape and daily usage.

Orientation tier

Stay API-first or use a very light local lane while the traffic shape is still uncertain.

Workstation tier

Move to a GPU or high-memory desk only when local inference is already a daily habit.

Serving tier

Shared private nodes only make sense after demand, privacy and support needs are already proven.

What usually breaks

  • Teams overbuy GPU before they understand whether the important layer is retrieval, chat or multimodal.
  • NPU laptops get sold as universal answers when they are only good for the lightest local lanes.
  • Ops cost gets ignored until the box turns into an internal API that nobody owns.

Hardware route snapshot

Profile Best use Budget band Local fit Watch-out
API-first Frontier reasoning and bursty demand Low capex Best before buying local Variable spend and provider lock-in
NPU laptop Mobile privacy and lightweight local help Low-mid Tiny local models Bandwidth and thermal limits
CPU + RAM retrieval node Embeddings and rerank Low-mid Retrieval-heavy stacks Poor generation throughput
16-24GB workstation Daily local prototyping Mid 7B-14B class VRAM ceiling and desk noise
48-80GB private node Shared internal API High Serious internal serving Ops overhead
hardwaredecision

Inference hardware guide

The practical decision layer for API-first, quiet desks, retrieval boxes and private serving nodes.

agentsserving

Agent stack board

Use it when hardware demand is being justified by agents, browser workers or orchestration.

workflowsops

Workflow recipes

Move into repeatable operating flows once the serving posture is already narrowed.