Skip to content

elixir-vibe/quackdb

Repository files navigation

QuackDB

Hex.pm HexDocs

DuckDB for Elixir applications, over DuckDB's experimental Quack protocol.

QuackDB gives Elixir applications an OTP-supervised DuckDB process, a DBConnection client, an Ecto adapter and query DSL for analytical DuckDB workflows, native append APIs, Explorer dataframe writes, Geo/WKB spatial integration, Table.Reader results, telemetry, and a managed DuckDB binary installer.

Warning

QuackDB targets DuckDB's experimental Quack protocol and is not production-ready yet. Public APIs, result shapes, Ecto adapter behavior, and supported protocol coverage may still change as DuckDB and QuackDB evolve. Validate behavior against your DuckDB version before relying on it for critical workloads.

defmodule MyApp.Analytics do
  use QuackDB.Ecto

  alias QuackDB.Source

  def category_latency do
    source = Source.parquet("s3://bucket/events/*.parquet", hive_partitioning: true)

    from event in source,
      group_by: event.category,
      select: %{
        category: event.category,
        p95: quantile_cont(event.duration_ms, 0.95),
        median: median(event.duration_ms),
        events: count()
      }
  end
end

Why QuackDB?

DuckDB is already excellent at analytical SQL. QuackDB focuses on the Elixir side:

  • run DuckDB as a supervised process during development, tests, notebooks, examples, or local apps;
  • use DBConnection semantics for pooled sessions, transactions, streams, and query lifecycle;
  • compose DuckDB analytical queries with Ecto instead of assembling SQL strings;
  • use Elixir-native values such as Duration, %Geo.*{}, Date.Range, maps, lists, structs, and Explorer dataframes where possible;
  • append rows, columns, and Explorer dataframes through DuckDB's native append path;
  • expose results to Livebook and dataframe tooling through Table.Reader;
  • keep raw SQL available when DuckDB-specific syntax is clearer or not represented by Ecto.

Elixir integrations

Elixir layer QuackDB integration
OTP supervised local DuckDB server, managed binary, restartable child specs
DBConnection pooled Quack sessions, queries, streams, transactions
Ecto adapter, query DSL, analytical helpers, migrations, writes
Explorer dataframe append and dataframe-friendly results
Geo %Geo.*{} params and WKB/GeoJSON workflows
Table.Reader Livebook/dataframe-friendly result consumption
Telemetry query, append, and fetch spans
Mix quackdb.install task for managed DuckDB binaries

Installation

Add QuackDB to your dependencies:

def deps do
  [
    {:quackdb, "~> 0.3.0"}
  ]
end

Optional integrations are enabled when their packages are present:

def deps do
  [
    {:quackdb, "~> 0.3.0"},
    {:ecto_sql, "~> 3.13"},
    {:explorer, "~> 0.11"},
    {:geo, "~> 4.1"}
  ]
end

DuckDB's Quack protocol is experimental. For local development, use DuckDB 1.5.3 or newer with the quack extension.

Supervised DuckDB

For development, tests, examples, and notebooks, QuackDB can supervise DuckDB's CLI process and start a matching client pool. child_specs/1 generates one random token and injects it into both children.

children =
  QuackDB.Server.child_specs(
    server: [
      name: MyApp.DuckDB,
      duckdb: :managed,
      endpoint: "quack:localhost:9494"
    ],
    client: [
      name: MyApp.QuackDB,
      pool_size: System.schedulers_online()
    ]
  )

duckdb: :managed downloads DuckDB's official CLI binary on first use, verifies known checksums for QuackDB's pinned DuckDB version, and caches it. QuackDB never downloads DuckDB during dependency compilation. Use QUACKDB_BINARY_PATH, QUACKDB_BINARY_CACHE_DIR, duckdb: "/path/to/duckdb", or run the quackdb.install Mix task when you want explicit control. See the managed DuckDB guide.

You can also start DuckDB manually:

duckdb -interactive -init /dev/null \
  -cmd "LOAD quack; CALL quack_serve('quack:localhost', token='super_secret');"

quack:localhost often binds on IPv6 localhost, so examples use http://[::1]:9494.

DBConnection client

QuackDB can be used directly through its DBConnection-backed client.

{:ok, conn} =
  QuackDB.start_link(
    uri: "http://[::1]:9494",
    token: "super_secret"
  )

{:ok, result} = QuackDB.query(conn, "SELECT ? AS name, ? AS n", ["duck", 42])

result.columns
#=> ["name", "n"]

result.rows
#=> [["duck", 42]]

Use QuackDB.stream/4 for large result sets, or QuackDB.columns/4 when a column-oriented shape is more convenient for analytics tooling.

DuckDB workflows as Ecto queries

QuackDB exposes common DuckDB analytical workflows as Ecto-compatible helpers so they compose with normal queries.

Analytical aggregates

defmodule MyApp.Analytics do
  use QuackDB.Ecto

  def category_scores do
    from event in "events",
      group_by: event.category,
      select: %{
        category: event.category,
        p95: quantile_cont(event.duration_ms, 0.95),
        median: median(event.duration_ms),
        precise_sum: fsum(event.duration_ms),
        mode: mode(event.duration_ms),
        weighted_average: weighted_avg(event.duration_ms, event.weight),
        values: list(event.duration_ms, order_by: [desc_nulls_last: event.duration_ms]),
        slow_events: filter(count(event.id), event.duration_ms > 1_000),
        distinct_users: count(event.user_id, :distinct),
        average_duration: coalesce(avg(event.duration_ms), 0),
        events: count()
      }
  end
end

Date and timestamp series

use QuackDB.Ecto

from day in series(Date.range(~D[2024-01-01], ~D[2024-01-31])),
  left_join: event in "events",
  on: event.occurred_on == day.value,
  group_by: day.value,
  order_by: day.value,
  select: %{
    day: day.value,
    events: count(event.id)
  }

Timestamp series use Duration steps:

from bucket in series(
       ~N[2024-01-01 00:00:00],
       ~N[2024-01-02 00:00:00],
       step: Duration.new!(hour: 1)
     ),
  select: bucket.value

Source scans

DuckDB can query data where it already lives. QuackDB source helpers can be used directly as Ecto sources.

use QuackDB.Ecto

alias QuackDB.Source

source = Source.parquet("s3://bucket/events/*.parquet", hive_partitioning: true)

from event in source,
  group_by: event.category,
  select: %{
    category: event.category,
    events: count(),
    avg_score: avg(event.score)
  }

QuackDB does not upload local files for you. The DuckDB server must be able to see the path, URL, object store, or lakehouse catalog. See the sources guide.

CTAS and full-text search

External data can be materialized with CREATE TABLE AS, indexed with DuckDB FTS, and queried with BM25 from Ecto.

use QuackDB.Ecto

alias QuackDB.{DDL, FTS, Source}

query =
  from doc in Source.parquet("s3://bucket/docs/*.parquet"),
    select: %{
      id: doc.id,
      title: doc.title,
      body: doc.body
    }

MyApp.AnalyticsRepo.query!(DDL.create_table("docs", as: query, temporary: true))
MyApp.AnalyticsRepo.query!(FTS.create_index("docs", :id, [:title, :body], overwrite: true))

schema = FTS.schema_name("main.docs")
search = "duckdb analytics"

from doc in "docs",
  where: bm25(^schema, doc.id, ^search) > 0,
  order_by: [desc: bm25(^schema, doc.id, ^search)],
  limit: 10,
  select: %{
    id: doc.id,
    title: doc.title,
    score: bm25(^schema, doc.id, ^search)
  }

See the full-text search guide.

Text and regex predicates

DuckDB text and RE2 regular-expression helpers compose with Ecto filters and aggregate FILTER clauses. Shared contains/2 dispatches obvious string calls to DuckDB contains and spatial expressions to ST_Contains; contains_text/2 and st_contains/2 are available when you want to be explicit.

use QuackDB.Ecto

from event in "events",
  where: contains(event.name, "duck") and regexp_matches(event.name, ~r/^duck/i),
  select: %{
    slug: regexp_replace(event.name, ~r/\s+/, "-", "g"),
    parts: string_split(event.tags, ",")
  }

DuckDB regexes use RE2, so ~r literals are intended for the syntax subset shared with Elixir regexes.

Spatial queries

DuckDB Spatial works with Ecto queries and %Geo.*{} structs when the optional :geo package is installed.

use QuackDB.Ecto

import QuackDB.Ecto.Spatial

alias QuackDB.Spatial

MyApp.AnalyticsRepo.query!(Spatial.load())

point = %Geo.Point{coordinates: {13.405, 52.52}, srid: nil}

from place in "places",
  where: intersects(place.geom, ^point) and distance(place.geom, ^point) < 1_000,
  select: %{
    id: place.id,
    name: place.name,
    wkt: as_text(place.geom)
  }

GEOMETRY values decode as WKB-compatible bytes for tested DuckDB Spatial values. QuackDB.Geometry can convert to/from %Geo.*{} structs when the optional :geo package is installed. See the spatial guide and the Spatial WMS example.

Writes and dataframes

QuackDB can write through DuckDB's native append protocol instead of generating huge INSERT VALUES statements.

QuackDB.insert_rows!(conn, "events", [
  [id: 1, name: "duck", tags: ["bird", "wetland"]],
  [id: 2, name: "goose", tags: ["bird", "loud"]]
])

QuackDB.insert_columns!(conn, "measurements", [
  id: [1, 2, 3],
  temperature: [12.5, 13.0, 12.8]
])

When Explorer is installed, dataframes can be appended directly:

alias Explorer.DataFrame
alias QuackDB.Explorer, as: QuackExplorer

frame = DataFrame.new(id: [1, 2], name: ["duck", "goose"])
QuackExplorer.insert_dataframe!(conn, "events", frame)

Enumerable rows can be streamed into native append batches:

File.stream!("events.ndjson")
|> Stream.map(&Jason.decode!/1)
|> QuackDB.insert_stream!(conn, "events", chunk_every: 10_000)

Any Table.Reader-compatible data can be appended through the same column append path:

QuackDB.insert_table!(conn, "events", %{id: [1, 2], name: ["duck", "goose"]})

Append supports explicit types, batching, scalar DuckDB values, and nested LIST, STRUCT, ARRAY, and MAP values. See the Explorer guide.

Results, Livebook, and telemetry

QuackDB.Result and QuackDB.Columns implement Table.Reader, so they can be consumed by Livebook and other Table-aware tooling. When Explorer is installed, query results can be turned into dataframes:

result = QuackDB.query!(conn, "SELECT * FROM events")
Explorer.DataFrame.new(result)

QuackDB emits telemetry spans for query, append, and fetch operations:

  • [:quackdb, :query, :start | :stop]
  • [:quackdb, :append, :start | :stop]
  • [:quackdb, :fetch, :start | :stop]

Metadata includes connection/session information, command details, append batch counts, and client query IDs. Params are not included unless you opt in with telemetry_params: true. See the telemetry guide.

Ecto coverage

QuackDB includes an optional Ecto SQL adapter for applications that want Ecto query composition, schema-based reads/writes, migrations, and raw SQL through Repo.query/3.

defmodule MyApp.AnalyticsRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.QuackDB
end

The adapter currently covers:

  • raw SQL via Repo.query/3;
  • schema-backed full selects and Repo.get!/2;
  • analytical reads with joins, filters, grouping, windows, CTEs, combinations, locks, fragments, and QuackDB helper macros;
  • Repo.insert/2, Repo.insert_all/3, RETURNING, ON CONFLICT DO NOTHING, and common DO UPDATE upserts;
  • explicit native append fast path via insert_method: :append;
  • update_all, delete_all, schema update/delete, and transaction usage;
  • Ecto.Adapters.SQL.explain/4;
  • basic migration DDL through Ecto migrator: create/drop/alter tables, columns, references, indexes, primary keys, check constraints, and renames.

DuckDB-specific SQL that Ecto cannot model cleanly should still use Repo.query/3. See the Ecto coverage matrix.

Examples

The repository includes runnable scripts, a Livebook notebook, and a small WMS app:

Run scripts from outside the Mix project so Mix.install/2 can load the local package:

cd /tmp
elixir /path/to/quackdb/examples/dataframe_analytics.exs

Current boundaries

QuackDB is intentionally focused on DuckDB analytics over Quack:

  • the Quack wire protocol is experimental and may change;
  • unsupported vector/logical types raise explicit protocol errors;
  • Ecto coverage focuses on analytical workflows and common write/setup paths, not every adapter edge case;
  • QuackDB does not stage/upload local files to a remote server;
  • Arrow IPC / zero-copy columnar handoff is research for now;
  • managed DuckDB binary downloads currently cover Linux/macOS targets, not Windows.

Documentation

Development

mix deps.get
mix ci

See CONTRIBUTING.md for local checks, example smoke tests, package audit steps, and release dry-run notes.

Integration tests are skipped by default. To run them against a Quack server:

QUACKDB_TEST_URI='http://[::1]:9494' \
QUACKDB_TEST_TOKEN=super_secret \
mix test --include integration

About

Experimental remote DuckDB Quack protocol client for Elixir, built on DBConnection with early Ecto support

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages