SIE: Superlinked Inference Engine

Open-source inference server and production cluster for embeddings, reranking, and extraction.

85+ models. Three functions. From laptop to Kubernetes. All Apache 2.0.

Docs | Quickstart | API Reference | Models

About

SIE is an open-source inference engine that serves embeddings, reranking, and entity extraction through a single unified API. It replaces the patchwork of separate model servers with one system that handles 85+ models across dense, sparse, multi-vector, vision, and cross-encoder architectures.

85+ pre-configured models, hot-swappable, all quality-verified against MTEB in CI
Serves multiple models simultaneously with on-demand loading and LRU eviction
Ships the full production stack: load-balancing gateway, KEDA autoscaling, Grafana dashboards, Terraform for GKE/EKS
Integrates with LangChain, LlamaIndex, Haystack, DSPy, CrewAI, Chroma, Qdrant, and Weaviate
OpenAI-compatible /v1/embeddings endpoint for drop-in migration

Quickstart

SIE is a Docker container; your code calls it over HTTP. Start the container, install the SDK, run the example.

1. Run the engine

# macOS (Apple Silicon)
docker run --platform linux/amd64 -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cpu-default

# Linux, CPU
docker run -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cpu-default

# Linux, NVIDIA GPU
docker run --gpus all -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface ghcr.io/superlinked/sie-server:latest-cuda12-default

Confirm it is up:

curl http://localhost:8080/readyz   # expect: ok

2. Use SIE from Python or TypeScript

pip install sie-sdk           # Python
pnpm add @superlinked/sie-sdk # TypeScript

The entire API is three functions: encode, score, extract.

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
# First call to each model downloads weights from Hugging Face (seconds for
# these tinies, longer for larger models). After that, calls are warm in ms.

# Encode: dense embeddings (all-MiniLM-L6-v2, ~90 MB)
result = client.encode("sentence-transformers/all-MiniLM-L6-v2", Item(text="Hello world"))
print(result["dense"].shape)  # (384,)

# Score: rerank documents by relevance (ms-marco MiniLM, ~80 MB)
scores = client.score(
    "cross-encoder/ms-marco-MiniLM-L-6-v2",
    Item(text="What is machine learning?"),
    [Item(text="ML learns from data."), Item(text="The weather is sunny.")]
)
print(scores["scores"])
# [{'item_id': 'item-0', 'score': -7.1,    'rank': 0},
#  {'item_id': 'item-1', 'score': -11.048, 'rank': 1}]
# (cross-encoder logits; relative order is what matters, not the absolute value)

# Extract: zero-shot named entity recognition, no training data
result = client.extract(
    "urchade/gliner_multi-v2.1",
    Item(text="Tim Cook is the CEO of Apple."),
    labels=["person", "organization"]
)
print(result["entities"])
# [{'text': 'Tim Cook', 'label': 'person',       'score': 0.991},
#  {'text': 'Apple',    'label': 'organization', 'score': 0.978}]

For the equivalent TypeScript example, see the TypeScript SDK docs. For more, see the full quickstart guide and SDK reference.

Production

The same code works against a production cluster. SIE ships a load-balancing gateway, KEDA autoscaling (scale to zero), Grafana dashboards, and Terraform modules for GKE and EKS. Not just the server, the whole stack. All Apache 2.0.

helm upgrade --install sie-cluster oci://ghcr.io/superlinked/charts/sie-cluster \
  --namespace sie --create-namespace \
  --set hfToken.create=true \
  --set hfToken.value=YOUR_HF_TOKEN \
  -f deploy/helm/sie-cluster/values-{gke|aws}.yaml

See the deployment guide.

Telemetry: SIE collects anonymous usage data (version, OS, architecture, GPU type) to understand adoption. No IP addresses, hostnames, or request data are collected. Disable with SIE_TELEMETRY_DISABLED=1 or DO_NOT_TRACK=1.

Explore

85+ models: Stella v5, BGE-M3, SPLADE v3, SigLIP, ColQwen2.5, BGE-reranker, GLiNER, Florence-2, and more. Dense, sparse, multi-vector, vision, rerankers, extractors. All pre-configured. All quality-verified against MTEB in CI. Pass the full Hugging Face model ID to the SDK (e.g. sentence-transformers/all-MiniLM-L6-v2, NovaSearch/stella_en_400M_v5); see the catalog for the complete list.

Integrations: LangChain, LlamaIndex, Haystack, DSPy, CrewAI, Chroma, Qdrant, Weaviate.

Notebooks: Quickstarts and walkthroughs

Examples: End-to-end project gallery

Why we built SIE: The motivation, told at AI Engineer Europe 2026.

superlinked.com/docs | Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
deploy		deploy
examples		examples
integrations		integrations
notebooks		notebooks
packages		packages
.gitignore		.gitignore
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
COMPATIBILITY.md		COMPATIBILITY.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIE: Superlinked Inference Engine

About

Quickstart

Production

Explore

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIE: Superlinked Inference Engine

About

Quickstart

Production

Explore

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages