Skip to content

Infrastructure

Inference engines, serving stacks, quantisation tools, vector databases, and deployment infrastructure for AI/LLM workloads.

Contents

Tool What it does
Ollama Local LLM inference server
LiteLLM Unified LLM API proxy

Sub-categories

  • Inference engines — vLLM, TGI, llama.cpp, MLX, etc.
  • Vector databases — Pinecone, Weaviate, Milvus, Qdrant, etc.
  • Serving & routing — Load balancers, model routers, API gateways
  • Quantisation & optimisation — GGUF, GPTQ, AWQ, etc.