URL Shortener Tutorial · Module 07 of 11

Observability

By the end of this tutorial you'll have JSON logs with trace IDs, Prometheus metrics on a /metrics endpoint, an OpenTelemetry trace spanning HTTP → cache → DB → queue, a Grafana dashboard, and an SLO + burn-rate alert document.

~5–7 hrspinoprom-clientOpenTelemetryGrafana
← Back to Module 07 overview
Definition of Done

What You'll Have

  • Structured JSON logs with trace_id on every entry.
  • /metrics endpoint exposing request rate, error rate, latency histogram, cache hit ratio, queue depth.
  • One end-to-end trace visible in a local Jaeger UI.
  • Prometheus + Grafana running locally; one dashboard with the four golden signals.
  • docs/slo.md with three SLIs/SLOs and a burn-rate alert policy.
The Steps

Build It

STEP 1

Add Prometheus, Grafana, and Jaeger to docker-compose

services:
  # …existing services…
  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes: ["./ops/prometheus.yml:/etc/prometheus/prometheus.yml"]
  grafana:
    image: grafana/grafana:latest
    ports: ["3001:3000"]
    environment: { GF_AUTH_ANONYMOUS_ENABLED: "true", GF_AUTH_ANONYMOUS_ORG_ROLE: Admin }
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports: ["16686:16686","4318:4318"]

Create ops/prometheus.yml:

global: { scrape_interval: 5s }
scrape_configs:
  - job_name: app
    static_configs: [{ targets: ['host.docker.internal:3000'] }]
docker compose up -d
STEP 2

Install observability libraries

npm install pino pino-http prom-client \
            @opentelemetry/sdk-node \
            @opentelemetry/auto-instrumentations-node \
            @opentelemetry/exporter-trace-otlp-http
STEP 3

Bootstrap OpenTelemetry

Create src/otel.ts and import it first in src/server.ts:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

new NodeSDK({
  serviceName: 'url-shortener',
  traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4318/v1/traces' }),
  instrumentations: [getNodeAutoInstrumentations()]
}).start();
// src/server.ts (very first line)
import './otel.js';
import 'dotenv/config';
// …
STEP 4

Add structured logging with trace IDs

Create src/logger.ts:

import pino from 'pino';
import pinoHttp from 'pino-http';
import { trace } from '@opentelemetry/api';

export const log = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  formatters: { level: (l) => ({ level: l }) }
});

export const httpLogger = pinoHttp({
  logger: log,
  customProps: () => {
    const span = trace.getActiveSpan()?.spanContext();
    return span ? { trace_id: span.traceId, span_id: span.spanId } : {};
  }
});

In src/app.ts:

import { httpLogger } from './logger.js';
app.use(httpLogger);
STEP 5

Add Prometheus metrics

Create src/metrics.ts:

import client from 'prom-client';

client.collectDefaultMetrics();

export const httpDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Request duration',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2]
});

export const cacheHits   = new client.Counter({ name: 'cache_hits_total',   help: 'Cache hits',   labelNames: ['family'] });
export const cacheMisses = new client.Counter({ name: 'cache_misses_total', help: 'Cache misses', labelNames: ['family'] });

export const registry = client.register;

In src/app.ts:

import { httpDuration, registry } from './metrics.js';

app.use((req, res, next) => {
  const end = httpDuration.startTimer({ method: req.method });
  res.on('finish', () => end({ route: req.route?.path ?? 'other', status: String(res.statusCode) }));
  next();
});

app.get('/metrics', async (_req, res) => {
  res.set('content-type', registry.contentType);
  res.send(await registry.metrics());
});

In src/cache.ts, increment the counters where you decide hit/miss.

STEP 6

Verify the pipeline

npm run dev
curl http://localhost:3000/<some-code>
curl http://localhost:3000/metrics | grep http_request_duration
  • Open Prometheus at http://localhost:9090 → query http_request_duration_seconds_count.
  • Open Jaeger at http://localhost:16686, pick service url-shortener → see one trace per request.
  • Open Grafana at http://localhost:3001; add Prometheus as a data source pointing to http://prometheus:9090.
STEP 7

Build the dashboard

In Grafana, create a dashboard with four panels:

  • Traffic: sum(rate(http_request_duration_seconds_count[1m])) by (route)
  • Errors: sum(rate(http_request_duration_seconds_count{status=~"5.."}[1m]))
  • Latency p95: histogram_quantile(0.95, sum by (le, route) (rate(http_request_duration_seconds_bucket[5m])))
  • Cache hit ratio: rate(cache_hits_total[1m]) / (rate(cache_hits_total[1m]) + rate(cache_misses_total[1m]))

Export the dashboard JSON to ops/grafana-dashboard.json.

STEP 8

Write the SLO doc

Create docs/slo.md:

# SLOs & Error Budget — URL Shortener

## SLI 1 — Redirect availability
good = http_request_duration_seconds_count{route="/:code", status!~"5.."}
total = http_request_duration_seconds_count{route="/:code"}
SLO: 99.95% / 30 days  (~21 minutes of budget)

## SLI 2 — Redirect latency
good = histogram p95 of {route="/:code"} < 50ms
SLO: 99% / 30 days

## SLI 3 — Click freshness (analytics)
SLO: 99% of clicks visible in stats within 5 min / 30 days

## Error-budget policy
- > 50% remaining: ship freely
- < 50%: code freeze on risky changes; reliability is next sprint's priority
- exhausted: feature freeze until recovered

## Burn-rate alerts
- Fast burn (page): 14× burn over 1h AND 5min
- Slow burn (ticket): 1× burn over 6h AND 30min
STEP 9

Commit

git checkout -b module-07
git add .
git commit -m "module 07: pino + prom-client + OTel + grafana + SLO doc"
git push -u origin module-07
Common Gotchas

If Something Goes Wrong

  • No traces in Jaeger — make sure ./otel.js is the very first import (before any HTTP libraries) and the OTLP URL points to :4318/v1/traces.
  • Prometheus shows no data — on Linux, host.docker.internal may not resolve; use extra_hosts mapping or your host IP.
  • Logs without trace IDs — confirm @opentelemetry/api is installed and the active span exists at log time (it does, after pino-http runs inside the request).
  • Histogram quantiles look spiky — try a wider window (5–10 min) and verify scrape interval is small enough.
What's Next

Move On