Hardware route

Use this route to orient hardware decisions before you turn them into capex.

Most teams do not need a rack on day one. They need a fast read on privacy, daily demand and the bottleneck that will actually hurt first.

Open full guide Workflow recipes

💡

Boundary

Treat this page as the hardware routing layer. Use it to orient the conversation, then jump into the dedicated inference guide for budget bands, bottlenecks and real buy-vs-API calls.

Reference lanes

5 ↑

API-first to private node

Quick read

3 •

budget, bottleneck, privacy

VRAM bands

3 ↑

NPU / 16-24GB / 48-80GB

Shared serving

1 •

Only when demand is stable

Selection logic

Start from privacy, workload shape and daily usage.

Orientation tier

Stay API-first or use a very light local lane while the traffic shape is still uncertain.

Workstation tier

Move to a GPU or high-memory desk only when local inference is already a daily habit.

Serving tier

Shared private nodes only make sense after demand, privacy and support needs are already proven.

What usually breaks

Teams overbuy GPU before they understand whether the important layer is retrieval, chat or multimodal.
NPU laptops get sold as universal answers when they are only good for the lightest local lanes.
Ops cost gets ignored until the box turns into an internal API that nobody owns.

Hardware route snapshot

Profile	Best use	Budget band	Local fit	Watch-out
API-first	Frontier reasoning and bursty demand	Low capex	Best before buying local	Variable spend and provider lock-in
NPU laptop	Mobile privacy and lightweight local help	Low-mid	Tiny local models	Bandwidth and thermal limits
CPU + RAM retrieval node	Embeddings and rerank	Low-mid	Retrieval-heavy stacks	Poor generation throughput
16-24GB workstation	Daily local prototyping	Mid	7B-14B class	VRAM ceiling and desk noise
48-80GB private node	Shared internal API	High	Serious internal serving	Ops overhead

hardwaredecision

Inference hardware guide

The practical decision layer for API-first, quiet desks, retrieval boxes and private serving nodes.

agentsserving

Agent stack board

Use it when hardware demand is being justified by agents, browser workers or orchestration.

workflowsops

Workflow recipes

Move into repeatable operating flows once the serving posture is already narrowed.