Infrastructure¶
Inference engines, serving stacks, quantisation tools, vector databases, and deployment infrastructure for AI/LLM workloads.
Contents¶
| Tool | What it does |
|---|---|
| Ollama | Local LLM inference server |
| LiteLLM | Unified LLM API proxy |
Sub-categories¶
- Inference engines — vLLM, TGI, llama.cpp, MLX, etc.
- Vector databases — Pinecone, Weaviate, Milvus, Qdrant, etc.
- Serving & routing — Load balancers, model routers, API gateways
- Quantisation & optimisation — GGUF, GPTQ, AWQ, etc.