Time-Series Databases · Database Family Deep Dive

Quick Facts

At a Glance

Core Ideas

Series: a unique combination of metric name + tag set (e.g. cpu_usage{host=web-1,region=us-east-1}). The unit of identity.
Append-only: writes arrive in roughly time order, are rarely updated, and are deleted in bulk by retention policies — not by row.
Aggressive compression. Delta-of-delta on timestamps, Gorilla / XOR on values. 10–50× over raw is normal.
Downsampling & retention: keep raw data for hours, 1-minute averages for weeks, 1-hour averages forever. Storage costs stay flat.
Cardinality is the trap. Series count, not row count, is the limit. Adding user_id as a tag is how you destroy a TSDB.

The Engines

Who Plays Here

Prometheus

The Kubernetes-era metrics standard. Pull-based scraping, PromQL, alerting built in. Single-node — pair with Thanos / Cortex / Mimir for long-term storage and HA.

InfluxDB

The original purpose-built TSDB. v2 brought Flux; v3 (IOx) is a Parquet + DataFusion rewrite. Strong UX for IoT and observability.

TimescaleDB

A Postgres extension. Hypertables partition by time automatically; you keep all of SQL, joins, and the Postgres ecosystem. The pragmatic pick when you also need relational data.

VictoriaMetrics / Mimir / Thanos

Horizontally scalable, Prometheus-compatible. The path most large infra teams take when single-node Prometheus stops fitting.

QuestDB / ClickHouse

Columnar engines that double as time-series stores. Fast ingest, SQL, and arbitrary analytics — popular in fintech tick data and product analytics.

AWS Timestream / GCP Bigtable / Azure Data Explorer

Managed cloud-native options. ADX (Kusto) is especially strong for log + metric analytics.

When TSDBs Win

The Workloads

Infrastructure & Application Metrics

CPU, memory, request rate, error rate, latency percentiles. Scraped every 15s, queried by dashboards and alert rules. Prometheus + Grafana is the default stack; for scale, Mimir / Thanos / VictoriaMetrics behind it.

IoT & Sensor Data

Thousands of devices each sending readings per second. Compression and downsampling earn their keep — raw data is huge, and the questions ("hourly average per device per day") only need summaries.

Financial Tick Data

Quotes, trades, order book updates — millions of events per second per symbol. QuestDB and ClickHouse dominate here because they marry TSDB ingest patterns with arbitrary SQL analytics.

Product / Behavioral Events

Clickstreams, feature usage, conversions. ClickHouse is increasingly the home of these — PostHog, Plausible, MixPanel-style workloads. TimescaleDB also fits when the data already lives next to relational app data.

When to Stay Away

Bad Fits

OLTP workloads. Updating individual rows, transactional joins across entities — not what TSDBs are tuned for.
High-cardinality dimensions. Per-user or per-request series will blow up most TSDBs (Prometheus especially). Use logs or a columnar warehouse instead.
Long-running analytical joins. Cross-series joins are weak in pure TSDBs. Push to a warehouse for analytics, keep the TSDB for live observability.
Strict ACID needs. Append-only model and async retention are not the place for "money never lost" guarantees.

Modeling

Designing the Schema

Tags vs Fields

Tags are indexed (used in WHERE, GROUP BY); fields are not. Put low-cardinality dimensions (host, region, service) in tags; put the actual measurement (latency_ms, temperature) in fields. Reverse this and queries crawl.

Watch Cardinality Like a Hawk

The product of all tag values is the series count. {host} × {endpoint} × {status_code} with 1k hosts, 200 endpoints, 10 statuses = 2M series. Add user_id at 1M users and you've created 2 trillion series — the database will refuse, OOM, or both.

Continuous Aggregates / Recording Rules

Pre-compute the rollups your dashboards need. Prometheus recording rules, Timescale continuous aggregates, Influx tasks. Dashboards then query small pre-computed series instead of crunching raw points every refresh.

Retention Tiers

Raw at 15s for 24h, 1m averages for 30d, 1h averages for 2y. Configure per-bucket / per-hypertable. Forgetting this is the most common cause of "why is our metrics bill suddenly $40k."

Pitfalls

Common Mistakes

Cardinality explosion. Tagging by user, request ID, or trace ID. Use logs or a columnar store; metrics are for aggregate behavior, not per-event.
No downsampling. Querying months of raw 1-second data for a year-on-year chart. Pre-aggregate.
Backfilling old data. TSDBs assume monotonic-ish writes. Backfills trigger compaction storms — batch them off-peak and tune accordingly.
Single-node Prometheus as the source of truth. No HA, no long-term storage. Put a remote-write target (Mimir / Thanos / VictoriaMetrics) behind it before this becomes urgent.
Treating it as a log store. Logs are unstructured, high-cardinality strings. Use Loki, Elasticsearch, or ClickHouse — not a TSDB.

Continue