Skip to content
@eval-hub

AI Evaluation Hub

Control plane providing an API layer to orchestrate LLM evaluations, benchmarks, and model profiling across evaluation frameworks and tools

EvalHub

Open source evaluation orchestration for AI systems.

EvalHub is a platform for running systematic evaluations of models, agents, and AI systems across multiple frameworks, without locking you into any single one.

Run evaluations against any registered benchmark, whether built-in or one you create yourself, track experiments in MLflow, and store immutable results as OCI artefacts.

It works locally for development and scales on Kubernetes for production.

Repositories

Repository Description
eval-hub Go REST API server — evaluation orchestration, provider registry, benchmark discovery, collection management
eval-hub-sdk Python SDK — async/sync clients, adapter framework (BYOF), CLI tools, MCP server for agent integration
eval-hub-contrib Community-contributed framework adapters (LightEval, GuideLLM, MTEB, and more)
eval-hub.github.io Documentation site — architecture, guides, SDK reference

Key capabilities

  • Versioned REST API (v1) with OpenAPI specification and interactive docs
  • Provider registry with benchmark discovery and category filtering
  • Benchmark collections with weighted scoring and compliance requirements
  • Bring Your Own Framework implement a single method to add any evaluation framework
  • Kubernetes-native job orchestration with resource isolation
  • MLflow integration for experiment tracking, lineage, and result comparison
  • OCI artefact persistence for reproducible, immutable evaluation results
  • Multi-provider batching groups compatible benchmarks to reduce execution time
  • Prometheus metrics and OpenTelemetry tracing

Supported adapters

Adapter What it evaluates
lm-eval-harness 167 benchmarks across 12 categories (reasoning, math, science, safety, ...)
LightEval Accuracy, normalised accuracy, exact match
GuideLLM TTFT, ITL, throughput, latency
MTEB Semantic similarity, retrieval, classification
Garak OWASP Top 10, vulnerability scanning, safety probes

Documentation

Full documentation is available at eval-hub.github.io:

Licence

Apache 2.0 — see LICENCE.

Pinned Loading

  1. eval-hub eval-hub Public

    Go 10 19

  2. eval-hub-sdk eval-hub-sdk Public

    Python Client and Framework Adapter SDK for EvalHub

    Python 3 9

Repositories

Showing 5 of 5 repositories

Top languages

Loading…

Most used topics

Loading…