Open-source inference server and production cluster for embeddings, reranking, and extraction.
85+ models. Three functions. From laptop to Kubernetes. All Apache 2.0.
Docs | Quickstart | API Reference | Models
SIE is an open-source inference engine that serves embeddings, reranking, and entity extraction through a single unified API. It replaces the patchwork of separate model servers with one system that handles 85+ models across dense, sparse, multi-vector, vision, and cross-encoder architectures.
- 85+ pre-configured models, hot-swappable, all quality-verified against MTEB in CI
- Serves multiple models simultaneously with on-demand loading and LRU eviction
- Ships the full production stack: load-balancing gateway, KEDA autoscaling, Grafana dashboards, Terraform for GKE/EKS
- Integrates with LangChain, LlamaIndex, Haystack, DSPy, CrewAI, Chroma, Qdrant, and Weaviate
- OpenAI-compatible
/v1/embeddingsendpoint for drop-in migration
SIE is a Docker container; your code calls it over HTTP. Start the container, install the SDK, run the example.
1. Run the engine
# macOS (Apple Silicon)
docker run --platform linux/amd64 -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cpu-default
# Linux, CPU
docker run -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cpu-default
# Linux, NVIDIA GPU
docker run --gpus all -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cuda12-defaultConfirm it is up:
curl http://localhost:8080/readyz # expect: ok2. Use SIE from Python or TypeScript
pip install sie-sdk # Python
pnpm add @superlinked/sie-sdk # TypeScriptThe entire API is three functions: encode, score, extract.
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# First call to each model downloads weights from Hugging Face (seconds for
# these tinies, longer for larger models). After that, calls are warm in ms.
# Encode: dense embeddings (all-MiniLM-L6-v2, ~90 MB)
result = client.encode("sentence-transformers/all-MiniLM-L6-v2", Item(text="Hello world"))
print(result["dense"].shape) # (384,)
# Score: rerank documents by relevance (ms-marco MiniLM, ~80 MB)
scores = client.score(
"cross-encoder/ms-marco-MiniLM-L-6-v2",
Item(text="What is machine learning?"),
[Item(text="ML learns from data."), Item(text="The weather is sunny.")]
)
print(scores["scores"])
# [{'item_id': 'item-0', 'score': -7.1, 'rank': 0},
# {'item_id': 'item-1', 'score': -11.048, 'rank': 1}]
# (cross-encoder logits; relative order is what matters, not the absolute value)
# Extract: zero-shot named entity recognition, no training data
result = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="Tim Cook is the CEO of Apple."),
labels=["person", "organization"]
)
print(result["entities"])
# [{'text': 'Tim Cook', 'label': 'person', 'score': 0.991},
# {'text': 'Apple', 'label': 'organization', 'score': 0.978}]For the equivalent TypeScript example, see the TypeScript SDK docs. For more, see the full quickstart guide and SDK reference.
The same code works against a production cluster. SIE ships a load-balancing gateway, KEDA autoscaling (scale to zero), Grafana dashboards, and Terraform modules for GKE and EKS. Not just the server, the whole stack. All Apache 2.0.
helm upgrade --install sie-cluster oci://ghcr.io/superlinked/charts/sie-cluster \
--namespace sie --create-namespace \
--set hfToken.create=true \
--set hfToken.value=YOUR_HF_TOKEN \
-f deploy/helm/sie-cluster/values-{gke|aws}.yamlSee the deployment guide.
Telemetry: SIE collects anonymous usage data (version, OS, architecture, GPU type) to understand adoption. No IP addresses, hostnames, or request data are collected. Disable with
SIE_TELEMETRY_DISABLED=1orDO_NOT_TRACK=1.
85+ models: Stella v5, BGE-M3, SPLADE v3, SigLIP, ColQwen2.5, BGE-reranker, GLiNER, Florence-2, and more.
Dense, sparse, multi-vector, vision, rerankers, extractors. All pre-configured. All quality-verified against MTEB in CI.
Pass the full Hugging Face model ID to the SDK (e.g. sentence-transformers/all-MiniLM-L6-v2, NovaSearch/stella_en_400M_v5); see the catalog for the complete list.
Integrations: LangChain, LlamaIndex, Haystack, DSPy, CrewAI, Chroma, Qdrant, Weaviate.
Notebooks: Quickstarts and walkthroughs
Examples: End-to-end project gallery
Why we built SIE: The motivation, told at AI Engineer Europe 2026.
superlinked.com/docs | Apache 2.0