Enterprise-Grade Observability for Modern Cloud Infrastructure
100% OpenTelemetry Compliant • Built with DDD/CQRS • Production-Ready • Apache 2.0 Licensed
- What is TelemetryFlow?
- Product Ecosystem
- High-Level Architecture
- Platform Capabilities
- Telemetry Signals
- Infrastructure Monitoring
- Database Monitoring
- Enterprise Features
- AI Intelligence
- Technology Stack
- Data Architecture
- Component Registry System
- Deployment
- Quick Start
- Repository Map
- Contributing
TelemetryFlow is an enterprise-grade, open-source observability platform that provides unified telemetry collection, storage, analysis, and visualization. It is 100% OpenTelemetry Protocol (OTLP) compliant, designed as an open-source alternative to commercial solutions like Datadog, New Relic, and Dynatrace.
| Problem | TelemetryFlow Solution |
|---|---|
| Fragmented Tooling | Unifies metrics, logs, traces, and exemplars into a single platform |
| Vendor Lock-in | 100% OTLP-compliant — works with any OpenTelemetry SDK or Collector |
| Multi-Tenancy Complexity | Hierarchical isolation: Region → Organization → Workspace → Tenant |
| High Cost | Self-hosted, eliminating per-GB pricing of commercial solutions |
| Compliance Requirements | Built-in audit logging, GDPR compliance, regional data segregation |
| Monitoring Silos | Consolidates Prometheus, kube-state-metrics, node-exporter into one agent |
TelemetryFlow is a modular ecosystem with 10+ specialized repositories, each purpose-built for a specific observability function:
graph TB
subgraph SDKs["Language SDKs"]
PYSDK["Python SDK<br/>telemetryflow-python-sdk"]
GOSDK["Go SDK<br/>telemetryflow-go-sdk"]
end
subgraph Collection["Data Collection"]
AGENT["TFO Agent<br/>telemetryflow-agent<br/>Replaces: Prometheus, KSM,<br/>node-exporter, FluentBit"]
COLLECTOR["TFO Collector<br/>telemetryflow-collector<br/>OCB Native, v1/v2 endpoints"]
end
subgraph Platform["Platform Core"]
MONO["Platform Monolith<br/>telemetryflow-platform<br/>NestJS + Vue 3"]
VIZ["TFO-Viz<br/>telemetryflow-viz<br/>Standalone Dashboard"]
end
subgraph AI["AI Layer"]
GOMCP["Go MCP Server<br/>telemetryflow-go-mcp"]
PYMCP["Python MCP Server<br/>telemetryflow-python-mcp"]
end
subgraph Docs["Documentation"]
OVERVIEW["Overview Docs<br/>telemetryflow-overview"]
PRODUCT["Product Docs<br/>telemetryflow-product"]
end
SDKs -->|"OTLP"| Collection
Collection -->|"OTLP v1/v2"| Platform
Collection -->|"OTLP"| VIZ
Platform -->|"MCP"| AI
Docs -.->|"Reference"| Platform
style SDKs fill:#e8f5e9,stroke:#2e7d32,color:#000
style Collection fill:#e3f2fd,stroke:#1565c0,color:#000
style Platform fill:#fff3e0,stroke:#e65100,color:#000
style AI fill:#f3e5f5,stroke:#6a1b9a,color:#000
style Docs fill:#f5f5f5,stroke:#616161,color:#000
| Repository | Language | Description |
|---|---|---|
| telemetryflow-platform | TypeScript (NestJS + Vue 3) | Core platform — backend API, frontend dashboard, dual database |
| telemetryflow-agent | Go 1.26 | Infrastructure agent — replaces Prometheus, KSM, node-exporter, FluentBit |
| telemetryflow-collector | Go 1.26 | OCB-native OTLP collector with TFO custom components |
| telemetryflow-python-sdk | Python 3.12+ | Python SDK for instrumenting applications |
| telemetryflow-go-sdk | Go 1.24+ | Go SDK for instrumenting applications |
| telemetryflow-viz | TypeScript (Vue 3) | Standalone observability visualization dashboard |
| telemetryflow-go-mcp | Go | MCP server for Claude AI integration |
| telemetryflow-python-mcp | Python | MCP server for Claude AI integration |
| telemetryflow-overview | Markdown | Comprehensive platform documentation |
| telemetryflow-product | Markdown | Product summary and features documentation |
flowchart TB
subgraph Sources["Telemetry Sources"]
APP1["Applications<br/>(Python/Go/Node)"]
K8S["Kubernetes<br/>Cluster"]
VM["VMs &<br/>Bare Metal"]
DB["Databases<br/>(MySQL, PostgreSQL,<br/>MongoDB, etc.)"]
EXT["External<br/>Services"]
end
subgraph SDKs["Instrumentation Layer"]
PSDK["Python SDK"]
GSDK["Go SDK"]
OTEL["OTEL SDKs<br/>(Any Language)"]
end
subgraph Collection["Collection Layer"]
AGENT["TFO Agent v1.2.0<br/>Node Exporter + K8s<br/>+ cAdvisor + DB + eBPF"]
TFOC["TFO Collector v1.2.1<br/>OCB Native<br/>v1/v2 Endpoints"]
end
subgraph Ingestion["Ingestion Layer"]
OTLP_EP["OTLP Endpoints<br/>/v1/metrics<br/>/v1/logs<br/>/v1/traces"]
AUTH["API Key Auth<br/>Argon2id Hash"]
QUEUE["BullMQ Queues<br/>otlp-ingestion (10)<br/>telemetry-processing (10)<br/>domain-events (5)"]
end
subgraph Storage["Storage Layer"]
PG["PostgreSQL 16<br/>IAM, Config, Entities<br/>Multi-tenant State"]
CH["ClickHouse 23+<br/>Metrics, Logs, Traces<br/>Materialized Views<br/>TTL Rollups"]
RD["Redis 7+<br/>L1/L2 Cache<br/>BullMQ Queues<br/>DB 0: Cache, DB 1: Queue"]
end
subgraph Messaging["Event Bus"]
NATS["NATS<br/>Domain Events<br/>Cross-Module Communication"]
end
subgraph Presentation["Presentation Layer"]
BE["NestJS Backend<br/>DDD/CQRS<br/>REST API /api/v2/"]
FE["Vue 3 Frontend<br/>Pinia + Naive UI<br/>ECharts Visualizations"]
MCP["MCP Servers<br/>Claude AI Integration"]
end
Sources --> SDKs
Sources --> Collection
SDKs -->|"OTLP"| Collection
Collection -->|"OTLP v1/v2"| Ingestion
Ingestion --> Storage
Ingestion --> Messaging
Storage --> BE
Messaging --> BE
BE --> FE
BE --> MCP
style Sources fill:#e8eaf6,stroke:#283593,color:#000
style SDKs fill:#e8f5e9,stroke:#2e7d32,color:#000
style Collection fill:#e3f2fd,stroke:#1565c0,color:#000
style Ingestion fill:#fff3e0,stroke:#e65100,color:#000
style Storage fill:#fce4ec,stroke:#880e4f,color:#000
style Messaging fill:#f3e5f5,stroke:#6a1b9a,color:#000
style Presentation fill:#e0f2f1,stroke:#004d40,color:#000
The platform backend follows Domain-Driven Design with strict layer separation — Domain, Application, Infrastructure, and Presentation:
graph LR
subgraph Core["Core Modules"]
AUTH["Auth"]
IAM["IAM"]
TEN["Tenancy"]
CACHE["Cache"]
end
subgraph Telemetry["Telemetry Modules"]
MET["Metrics"]
LOGS["Logs"]
TRC["Traces"]
EXM["Exemplars"]
COR["Correlations"]
end
subgraph Monitoring["Monitoring Modules"]
AGT["Agent"]
K8S["Kubernetes"]
VM_M["VM"]
UPT["Uptime"]
STP["Status Page"]
SVM["Service Map"]
NWM["Network Map"]
DBM["DB Monitoring"]
end
subgraph Platform["Platform Modules"]
DSH["Dashboard"]
ALR["Alerting"]
RET["Retention"]
SUB["Subscription"]
APK["API Keys"]
NOT["Notification"]
SSO["SSO"]
AUD["Audit"]
end
subgraph Intelligence["Intelligence"]
AI["AI Intelligence"]
LLM["LLM"]
QRY["Query (TFQL)"]
DM["Data Masking"]
end
subgraph Reporting["Reporting"]
RPT["Reporting"]
end
style Core fill:#e8f5e9,stroke:#2e7d32,color:#000
style Telemetry fill:#e3f2fd,stroke:#1565c0,color:#000
style Monitoring fill:#fff3e0,stroke:#e65100,color:#000
style Platform fill:#fce4ec,stroke:#880e4f,color:#000
style Intelligence fill:#f3e5f5,stroke:#6a1b9a,color:#000
style Reporting fill:#e0f7fa,stroke:#00695c,color:#000
Each module follows the same internal architecture:
graph TB
subgraph Module["Module (e.g., Kubernetes)"]
PRE["Presentation Layer<br/>Controllers, DTOs, Guards"]
APP["Application Layer<br/>Commands, Queries, Handlers"]
DOM["Domain Layer<br/>Aggregates, Entities,<br/>Value Objects, Events,<br/>Repository Interfaces"]
INF["Infrastructure Layer<br/>TypeORM Repos,<br/>Persistence, Messaging"]
end
PRE --> APP
APP --> DOM
INF -.->|"implements"| DOM
style PRE fill:#e3f2fd,stroke:#1565c0,color:#000
style APP fill:#e8f5e9,stroke:#2e7d32,color:#000
style DOM fill:#fff3e0,stroke:#e65100,color:#000
style INF fill:#f3e5f5,stroke:#6a1b9a,color:#000
All telemetry signals flow through a unified OTLP ingestion pipeline:
sequenceDiagram
participant SRC as Telemetry Source
participant COL as TFO Collector
participant API as Platform API
participant AUTH as API Key Auth
participant Q as BullMQ Queue
participant W as Queue Worker
participant CH as ClickHouse
SRC->>COL: OTLP Export
COL->>API: POST /v1/metrics (or /v1/logs, /v1/traces)
API->>AUTH: Validate API Key (Argon2id)
AUTH-->>API: Authorized
API->>Q: Enqueue Job (async)
API-->>COL: 202 Accepted
Q->>W: Process Job
W->>W: Batch 10K rows
W->>CH: INSERT with MV rollup
Note over CH: raw → 1m → 1h → 1d cascade
- Storage: ClickHouse time-series with pre-aggregation materialized views
- Types: Gauges, Counters, Histograms, Summaries
- Aggregation: sum, avg, min, max, percentiles (p50, p90, p95, p99)
- Rollup Cascade: raw → 1m → 1h → 1d (automatic via materialized views)
- Exemplars: Metric-to-trace correlation for contextual debugging
- Structured logging with full-text search across all attributes
- Severity levels: DEBUG, INFO, WARN, ERROR, FATAL
- Trace context propagation (traceId, spanId linking)
- Real-time streaming via WebSocket
- High-cardinality attribute indexing
- Distributed tracing with waterfall span visualization
- Service dependency mapping from span relationships
- Critical path analysis identifying bottlenecks
- Trace-log correlation for unified debugging
- Span attribute search with flexible filtering
- Correlations: Links traces → logs → metrics for unified incident investigation
- Exemplars: Attach exemplar trace IDs to metric data points for contextual drill-down
- TTL: 7d (exemplars) → 30d (logs/traces) → 90d (metrics/audit/uptime)
The TFO Agent is a Go-based agent that replaces multiple traditional monitoring tools:
graph TB
subgraph Replaced["Replaces These Tools"]
PROM["Prometheus"]
KSM["kube-state-metrics"]
NE["node-exporter"]
FB["FluentBit"]
MS["metrics-server"]
CAD["cAdvisor"]
end
subgraph Agent["TFO Agent v1.2.0 (Go 1.26)"]
NE_MOD["Node Exporter Module<br/>CPU, Memory, DiskIO,<br/>Filesystem, Network, Load"]
K8S_MOD["Kubernetes Module<br/>Nodes, Pods, Deployments,<br/>Services, HPA, PDB, Events"]
CAD_MOD["cAdvisor Module<br/>Container CPU, Memory,<br/>Network, Filesystem"]
LOG_MOD["Log Collector<br/>Pod Logs, Node Logs,<br/>Kubelet, Containerd"]
DB_MOD["Database Collectors<br/>MySQL, PostgreSQL, MongoDB,<br/>MSSQL, ClickHouse, CockroachDB,<br/>Aurora, TimescaleDB, SQLite3"]
EBPF_MOD["eBPF Module<br/>Syscalls, Network, File I/O,<br/>Scheduler, Hubble"]
end
Replaced -.->|"Consolidated into"| Agent
NE_MOD -->|"k8s.* metrics"| PLATFORM["TFO Platform"]
K8S_MOD -->|"k8s.* metrics"| PLATFORM
CAD_MOD -->|"container.cadvisor.*"| PLATFORM
LOG_MOD -->|"OTLP Logs"| PLATFORM
DB_MOD -->|"OTLP Metrics"| PLATFORM
EBPF_MOD -->|"ebpf.* metrics"| PLATFORM
style Replaced fill:#ffebee,stroke:#c62828,color:#000
style Agent fill:#e8f5e9,stroke:#2e7d32,color:#000
flowchart LR
subgraph Sources["Telemetry Sources"]
APP["Applications<br/>OTLP SDK"]
AGENT["TFO Agent"]
EXT["External<br/>Services"]
end
subgraph Collector["TFO Collector v1.2.1 (OCB)"]
RCV["tfootlp Receiver<br/>gRPC :4317<br/>HTTP :4318"]
PROC["Processors<br/>k8sattributes, batch,<br/>transform, resource"]
EXP_TFO["tfo Exporter<br/>TFO Platform"]
EXP_PROM["prometheus Exporter<br/>:8889"]
CONN["Connectors<br/>spanmetrics, servicegraph"]
end
Sources --> RCV
RCV --> PROC
PROC --> EXP_TFO
PROC --> EXP_PROM
PROC --> CONN
style Sources fill:#e8eaf6,stroke:#283593,color:#000
style Collector fill:#e3f2fd,stroke:#1565c0,color:#000
Key Features:
- Dual Endpoints: Community v1 (
/v1/*) + Platform v2 (/v2/*) on same port - 85+ OTel Components: Built-in receivers, processors, exporters
- TFO Custom Components:
tfootlpreceiver,tfoexporter,tfoauthextension,tfoidentityextension - Connectors: spanmetrics (exemplars support), servicegraph (service dependency maps)
- Security: Alpine runtime, non-root, CVE-patched, RBAC for K8s
Comprehensive K8s observability with 79+ graph definitions and 8 datatables:
| Category | Metrics | Graphs |
|---|---|---|
| Node Metrics | CPU, Memory, Disk, Network, Load | 15+ |
| Pod/Container | CPU, Memory, Restarts, Status | 20+ |
| Workloads | Deployments, StatefulSets, DaemonSets | 12+ |
| Storage | PV, PVC, Storage Classes | 8+ |
| Network | Services, Endpoints, Ingresses | 10+ |
| Cluster | API Server, CoreDNS, Events, HPA | 14+ |
Infrastructure monitoring for virtual machines and bare-metal servers with agent-based collection.
Synthetic checks and endpoint monitoring for external service availability tracking.
The eBPF collector provides 28 kernel-level metrics across 7 categories:
- Syscall: count, latency, errors (with pid, comm, syscall labels)
- Network: TCP connections, bytes, RTT, retransmits; UDP packets
- File I/O: operations, bytes, latency
- Scheduler: context switches, runq latency, oncpu, migrations
- Memory: page faults (major/minor)
- TCP State: state transitions tracking
- Hubble: flows, drops, policy verdicts, HTTP requests, DNS queries
| Category | Integrations | Count |
|---|---|---|
| Cloud Providers | GCP, Azure, Alibaba Cloud, AWS CloudWatch | 4 |
| Infrastructure | Proxmox, VMware vSphere, Nutanix, Azure Arc | 4 |
| Network & IoT | Cisco (DNA Center/Meraki), SNMP v1/v2c/v3, MQTT | 3 |
| Kernel/System | eBPF (syscalls, network, file I/O, scheduler), Cilium Hubble | 2 |
| APM Platforms | Dynatrace, IBM Instana, Datadog, New Relic | 4 |
| OSS Observability | SigNoz, Coroot, HyperDX, OpenObserve, Netdata | 5 |
| Observability | Prometheus, Splunk, Elasticsearch | 3 |
| Streaming & Logs | Kafka, Loki, InfluxDB | 3 |
| Tracing | Jaeger, Zipkin | 2 |
| Monitoring Tools | Telegraf, Grafana Alloy, Percona PMM, Blackbox, ManageEngine | 5 |
| Custom | Webhook | 1 |
Comprehensive database performance monitoring with native collectors for popular databases:
graph TB
subgraph Databases["Database Sources"]
MYSQL["MySQL / MariaDB<br/>Percona"]
PG["PostgreSQL<br/>RDS PostgreSQL"]
MONGO["MongoDB"]
MSSQL["MSSQL"]
CH["ClickHouse"]
CRDB["CockroachDB"]
AURORA["Amazon Aurora<br/>CloudWatch/PI/RDS"]
TSCALE["TimescaleDB"]
SQLITE["SQLite3"]
end
subgraph Agent["TFO Agent Collectors"]
COLL["Database Collectors<br/>Direct Connection / Cloud SDK"]
end
subgraph Platform["TFO Platform"]
DBMON["DB Monitoring Module<br/>Inventory, Health, Performance"]
QAN["Query Analytics (QAN)<br/>Top Queries, Slow Queries,<br/>Execution Statistics"]
end
Databases -->|"OTLP Metrics"| Agent
Agent -->|"OTLP"| Platform
DBMON --> QAN
style Databases fill:#e3f2fd,stroke:#1565c0,color:#000
style Agent fill:#e8f5e9,stroke:#2e7d32,color:#000
style Platform fill:#fff3e0,stroke:#e65100,color:#000
| Collector | Source | Metrics |
|---|---|---|
| Amazon Aurora | AWS SDK (CloudWatch, RDS, PI) | 60+ CloudWatch metrics across storage, replication, cache, latency, transactions |
| MySQL/MariaDB | Direct connection | Global status, InnoDB, replication, Galera, query analytics, Percona |
| PostgreSQL | Direct connection | pg_stat_activity, pg_stat_database, pg_stat_bgwriter, pg_stat_statements, replication |
| MSSQL | Direct connection | Wait stats, perf counters, index usage, tempdb, agent jobs, query store |
| MongoDB | Direct connection | Server status, replica set, sharding, query profiler, collection stats |
| ClickHouse | HTTP API | System tables, query metrics, merge stats, replication queue |
| CockroachDB | Direct connection | SQL stats, range stats, store metrics, replication |
| TimescaleDB | Direct connection | Hypertable stats, chunk stats, compression ratios, continuous aggregates |
| SQLite3 | File access | Page cache, WAL metrics, lock contention, integrity checks |
Hierarchical isolation model with automatic data segregation:
graph TD
REGION["Region<br/>Geographic Isolation<br/>us-east, eu-west, ap-south"]
REGION --> ORG1["Organization 1"]
REGION --> ORG2["Organization 2"]
ORG1 --> WS1["Workspace 1: Backend"]
ORG1 --> WS2["Workspace 2: Frontend"]
WS1 --> T1["Tenant: Production"]
WS1 --> T2["Tenant: Staging"]
WS1 --> T3["Tenant: Development"]
WS2 --> T4["Tenant: Production"]
WS2 --> T5["Tenant: Development"]
style REGION fill:#e8eaf6,stroke:#283593,color:#000
style ORG1 fill:#e3f2fd,stroke:#1565c0,color:#000
style ORG2 fill:#e3f2fd,stroke:#1565c0,color:#000
graph LR
SA["Super Administrator<br/>Full system access"]
ADM["Administrator<br/>Organization management"]
DEV["Developer<br/>Read/write telemetry"]
VWR["Viewer<br/>Read-only access"]
DEMO["Demo<br/>Sandbox access"]
SA --> ADM --> DEV --> VWR --> DEMO
style SA fill:#c62828,stroke:#b71c1c,color:#fff
style ADM fill:#e65100,stroke:#bf360c,color:#fff
style DEV fill:#1565c0,stroke:#0d47a1,color:#fff
style VWR fill:#2e7d32,stroke:#1b5e20,color:#fff
style DEMO fill:#616161,stroke:#424242,color:#fff
- Authentication: JWT, MFA, SSO (Google, GitHub, Azure AD, Okta)
- Authorization: Role-based access control with 5 tiers
- API Keys: Argon2id-hashed keys with scope and tenant binding
- Audit Logging: Immutable time-series audit trail in ClickHouse
- Data Masking: PII redaction policies for sensitive telemetry data
- 33 production-ready alert rules with fatigue prevention
- Multi-channel notifications: Email, Slack, Webhook, PagerDuty
- Alert fatigue management: Deduplication, grouping, silencing
- Severity levels: Critical, Warning, Info
- Threshold types: Static, Anomaly-based
- 6 pre-configured templates with 12+ widget types
- Custom dashboards with drag-and-drop layout
- Real-time updates via WebSocket
- Cross-signal correlation widgets
- Scheduled reports with PDF generation
- 9 API endpoints at
/api/v2/reports/ - Template-based report generation
- Email delivery with customizable schedules
- Retention policies: Per-signal TTL management (7d–90d+)
- Subscription management: Plan-based feature gating
- Data lifecycle: Automatic rollup and archival
Model Context Protocol servers enable AI-powered observability:
flowchart LR
subgraph AI["AI Assistants"]
CLAUDE["Claude AI"]
end
subgraph MCPS["MCP Servers"]
GMCP["Go MCP Server<br/>telemetryflow-go-mcp"]
PMCP["Python MCP Server<br/>telemetryflow-python-mcp"]
end
subgraph Platform["TFO Platform"]
API["REST API<br/>/api/v2/"]
CH["ClickHouse<br/>Telemetry Data"]
PG["PostgreSQL<br/>Config & State"]
end
AI -->|"MCP Protocol"| MCPS
MCPS -->|"DDD/CQRS"| API
API --> CH
API --> PG
- Claude AI integration for natural language querying
- TFQL generation from natural language descriptions
- Anomaly explanation with contextual analysis
- Incident summarization across correlated signals
TelemetryFlow Query Language translates to multiple backends:
flowchart LR
USER["User Query<br/>(TFQL or NL)"]
TFQL["TFQL Engine"]
PROM["PromQL<br/>Metrics"]
CHSQL["ClickHouse SQL<br/>Logs/Traces"]
ES["Elasticsearch DSL<br/>Full-text"]
USER --> TFQL
TFQL --> PROM
TFQL --> CHSQL
TFQL --> ES
graph TB
subgraph Frontend["Frontend"]
VUE["Vue 3.5+<br/>Composition API"]
TS["TypeScript 5.x"]
PINIA["Pinia<br/>State Management"]
NAIVE["Naive UI<br/>Component Library"]
ECHARTS["Apache ECharts 5.x<br/>Visualizations"]
VITE["Vite 6.x<br/>Build Tool"]
UNO["UnoCSS<br/>Utility Styles"]
end
subgraph Backend["Backend"]
NEST["NestJS 11.x<br/>Framework"]
TYPEORM["TypeORM<br/>PostgreSQL ORM"]
BULL["BullMQ<br/>Job Queues"]
NATS_CLIENT["NATS<br/>Event Bus"]
end
subgraph Databases["Databases"]
PG["PostgreSQL 16<br/>Relational State"]
CLICK["ClickHouse 23+<br/>Time-Series Analytics"]
REDIS["Redis 7+<br/>Cache & Queue"]
end
subgraph Agent["Agent & Collector"]
GOAGENT["Go 1.26<br/>TFO Agent v1.2.0"]
GOCOL["Go 1.26<br/>TFO Collector v1.2.1 (OCB)"]
OTEL_SDK["OpenTelemetry SDK<br/>SDK v1.43.0 / Core v1.58.0"]
end
subgraph Infra["Infrastructure"]
DOCKER["Docker / Docker Compose"]
K8S_DEPLOY["Kubernetes<br/>(Helm Charts)"]
PROM_SERVER["Prometheus<br/>(Remote Write)"]
end
style Frontend fill:#42b883,stroke:#2c3e50,color:#fff
style Backend fill:#e0234e,stroke:#fff,color:#fff
style Databases fill:#336791,stroke:#fff,color:#fff
style Agent fill:#00add8,stroke:#fff,color:#fff
style Infra fill:#2496ed,stroke:#fff,color:#fff
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Vue 3 + TypeScript + Vite | SPA dashboard with Pinia stores |
| UI Framework | Naive UI + UnoCSS | Enterprise component library + utility CSS |
| Visualization | Apache ECharts 5.x | Time-series, heatmaps, flame graphs, treemaps |
| Backend | NestJS 11.x | REST API with DDD/CQRS architecture |
| ORM | TypeORM | PostgreSQL entity management with migrations |
| Relational DB | PostgreSQL 16 | IAM, configuration, multi-tenant state |
| Time-Series DB | ClickHouse 23+ | Metrics, logs, traces with materialized views |
| Cache | Redis 7+ | Dual-layer cache (L1 in-memory, L2 Redis) + queues |
| Queue | BullMQ on Redis DB 1 | Async processing (ingestion, events, alerts, reports) |
| Messaging | NATS | Cross-module domain events |
| Agent | Go 1.26 | Infrastructure collection (replaces Prometheus stack) |
| Collector | Go 1.26 (OCB) | OTLP routing with TFO authentication |
| SDKs | Python 3.12+ / Go 1.24+ | Application instrumentation |
| Containerization | Docker + Docker Compose | Development and deployment |
| Orchestration | Kubernetes + Helm | Production deployment |
graph TB
subgraph Write["Write Path"]
CMD["Commands<br/>(CQRS Writes)"]
OTLP["OTLP Ingestion"]
end
subgraph Read["Read Path"]
QRY["Queries<br/>(CQRS Reads)"]
TFQL["TFQL Engine"]
end
subgraph PG_Layer["PostgreSQL Layer"]
IAM["IAM Data<br/>Users, Roles, Permissions"]
CONFIG["Configuration<br/>Dashboards, Alerts, Retention"]
STATE["App State<br/>Subscriptions, API Keys, Tenants"]
end
subgraph CH_Layer["ClickHouse Layer"]
METS["Metrics<br/>10 base tables, 24 MVs"]
LOGS_CH["Logs<br/>Structured + Full-text"]
TRACES["Traces<br/>Spans + Services"]
AUDIT["Audit Logs<br/>Immutable Trail"]
K8S_DATA["K8s Monitoring<br/>Node/Pod/Container Metrics"]
end
CMD --> PG_Layer
OTLP -->|"BullMQ Worker"| CH_Layer
QRY --> PG_Layer
QRY --> CH_Layer
TFQL --> CH_Layer
style Write fill:#e8f5e9,stroke:#2e7d32,color:#000
style Read fill:#e3f2fd,stroke:#1565c0,color:#000
style PG_Layer fill:#336791,stroke:#1a4a6e,color:#fff
style CH_Layer fill:#ffcc00,stroke:#b8860b,color:#000
graph LR
RAW["Raw Data<br/>Full fidelity<br/>TTL: 7-30d"]
ONE_M["1-Minute Agg<br/>Sum, Avg, Min, Max<br/>TTL: 30-90d"]
ONE_H["1-Hour Agg<br/>Pre-computed rollups<br/>TTL: 90-180d"]
ONE_D["1-Day Agg<br/>Long-term trends<br/>TTL: 365d+"]
RAW -->|"Materialized View"| ONE_M
ONE_M -->|"Materialized View"| ONE_H
ONE_H -->|"Materialized View"| ONE_D
style RAW fill:#ffebee,stroke:#c62828,color:#000
style ONE_M fill:#fff3e0,stroke:#e65100,color:#000
style ONE_H fill:#e3f2fd,stroke:#1565c0,color:#000
style ONE_D fill:#e8f5e9,stroke:#2e7d32,color:#000
| Queue | Concurrency | Purpose |
|---|---|---|
otlp-ingestion |
10 | OTLP telemetry data processing |
telemetry-processing |
10 | Post-ingestion transformations |
domain-events |
5 | Cross-module event propagation |
alerts |
5 | Alert evaluation and notification |
notifications |
3 | Email, Slack, webhook delivery |
reports |
3 | Scheduled report generation |
| Layer | TTL | Storage | Purpose |
|---|---|---|---|
| L1 — In-Memory | 60s | Process memory | Hot data, API responses |
| L2 — Redis | 1800s | Redis DB 0 | Distributed cache, cross-instance |
Key prefix: tf:cache: with event-driven invalidation.
The frontend uses a centralized registry for all UI components:
graph TB
subgraph Registries["Component Registries"]
GR["Graph Registry<br/>260+ definitions<br/>ID: XXX1####"]
SP["Stat Panel Registry<br/>158 definitions<br/>ID: XXX2####"]
DT["DataTable Registry<br/>41 definitions<br/>ID: XXX3####"]
end
subgraph Composables["Vue Composables"]
UGR["useGraphFromRegistry()"]
USP["useStatPanelsFromRegistry()"]
UDT["useDataTableFromRegistry()"]
end
subgraph Components["UI Components"]
RGP["RegistryGraphPanel<br/>3 variants: default/mini/panel<br/>13 chart types"]
SP_COMP["StatPanelCard"]
DT_COMP["DataTable"]
end
Registries --> Composables
Composables --> Components
style Registries fill:#e8eaf6,stroke:#283593,color:#000
style Composables fill:#e8f5e9,stroke:#2e7d32,color:#000
style Components fill:#fff3e0,stroke:#e65100,color:#000
23 Module Codes: HOM, DSH, MET, TRC, LOG, COR, EXP, ALR, RPT, UPT, STP, SVM, NWM, K8S, INF, AGT, RET, SUB, IAM, TEN, AUD, APK, NOT, LLM
Chart Types: Line, Area, Bar, Stacked Bar, Heatmap, Pie, Donut, Gauge, Treemap, Flame Graph, Table, Scatter, Text
# Core services (PostgreSQL, ClickHouse, Redis, NATS, Backend, Frontend)
docker-compose --profile core up -d
# Core + Monitoring (TFO Collector, TFO Agent, Jaeger)
docker-compose --profile core --profile monitoring up -d
# Everything
docker-compose --profile all up -dgraph LR
subgraph Core["Core Profile"]
PG_SVC["PostgreSQL 16<br/>:5432"]
CH_SVC["ClickHouse 23+<br/>:8123 / :9000"]
RD_SVC["Redis 7+<br/>:6379"]
NT_SVC["NATS<br/>:4222"]
BE_SVC["Backend (NestJS)<br/>:3000"]
FE_SVC["Frontend (Vue)<br/>:8080"]
end
subgraph Mon["Monitoring Profile"]
COL_SVC["TFO Collector v1.2.1<br/>:4317 / :4318"]
AGT_SVC["TFO Agent v1.2.0<br/>Daemon"]
JAEGER["Jaeger<br/>:16686"]
end
subgraph Tools["Tools Profile"]
PORTAINER["Portainer<br/>:9443"]
end
style Core fill:#e8f5e9,stroke:#2e7d32,color:#000
style Mon fill:#e3f2fd,stroke:#1565c0,color:#000
style Tools fill:#f5f5f5,stroke:#616161,color:#000
TFO Agent and Collector include Helm charts and Kubernetes manifests:
- Agent: DaemonSet deployment for node-level collection
- Collector: Deployment with Service for OTLP routing
- Platform: Full stack deployment with persistent volumes
- Node.js 20+ & pnpm 9+
- Docker & Docker Compose
- Go 1.24+ (for Agent/Collector development)
# 1. Clone the platform monolith
git clone https://github.com/telemetryflow/telemetryflow-platform.git
cd telemetryflow-platform
# 2. Start infrastructure
docker-compose --profile core up -d
# 3. Install dependencies
pnpm install
# 4. Run migrations & seed data
pnpm db:migrate
pnpm db:seed
# 5. Start development servers
pnpm dev| Service | URL |
|---|---|
| Frontend Dashboard | http://localhost:8080 |
| Backend API | http://localhost:3000/api/v2 |
| API Documentation | http://localhost:3000/api/docs |
| Health Check | http://localhost:3000/health |
| ClickHouse | http://localhost:8123 |
Python:
pip install telemetryflow-python-sdkfrom telemetryflow import TelemetryFlow
tfo = TelemetryFlow(
endpoint="http://localhost:4318",
api_key="your-api-key"
)
tfo.init() # Auto-instruments Flask/FastAPI/DjangoGo:
go get github.com/telemetryflow/telemetryflow-go-sdkimport tfo "github.com/telemetryflow/telemetryflow-go-sdk"
func main() {
sdk, _ := tfo.NewBuilder().
WithEndpoint("localhost:4318").
WithAPIKey("your-api-key").
Build()
defer sdk.Shutdown()
// Auto-instruments net/http, gin, echo, grpc
}TelemetryFlow/
├── telemetryflow-platform/ # Core platform (NestJS + Vue 3)
│ ├── backend/ # NestJS API (DDD/CQRS)
│ │ └── src/modules/ # 25+ business modules
│ ├── frontend/ # Vue 3 dashboard
│ │ └── src/
│ │ ├── views/ # 16 feature views
│ │ ├── registry/ # Component registries (459 entries)
│ │ ├── composables/ # Vue composables
│ │ └── store/ # Pinia stores
│ └── docker-compose.yml # Full-stack Docker setup
│
├── telemetryflow-agent/ # Infrastructure agent (Go)
│ ├── cmd/ # Entry points
│ ├── internal/
│ │ ├── collector/ # Node, K8s, cAdvisor, DB, eBPF collectors
│ │ └── agent/ # Agent lifecycle
│ ├── deploy/helm/ # Helm charts
│ └── configs/ # One-for-all config
│
├── telemetryflow-collector/ # OTLP collector (Go, OCB)
│ ├── components/ # TFO custom OCB components
│ ├── cmd/ # Collector entry point
│ └── configs/ # Pipeline configs
│
├── telemetryflow-python-sdk/ # Python SDK
├── telemetryflow-go-sdk/ # Go SDK
├── telemetryflow-viz/ # Standalone viz dashboard
├── telemetryflow-go-mcp/ # Go MCP server (Claude AI)
├── telemetryflow-python-mcp/ # Python MCP server (Claude AI)
├── telemetryflow-overview/ # Documentation hub
└── telemetryflow-product/ # Product summary (this repo)
We welcome contributions! Please see the individual repository CONTRIBUTING.md files for guidelines.
- License: Apache 2.0
- Built by: DevOpsCorner Indonesia
- Website: telemetryflow.id
TelemetryFlow — Unified Observability for Modern Infrastructure
100% OpenTelemetry • Enterprise-Grade • Open Source