By the end of this tutorial you'll have JSON logs with trace IDs, Prometheus metrics on a /metrics endpoint, an OpenTelemetry trace spanning HTTP → cache → DB → queue, a Grafana dashboard, and an SLO + burn-rate alert document.
trace_id on every entry./metrics endpoint exposing request rate, error rate, latency histogram, cache hit ratio, queue depth.docs/slo.md with three SLIs/SLOs and a burn-rate alert policy.services:
# …existing services…
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes: ["./ops/prometheus.yml:/etc/prometheus/prometheus.yml"]
grafana:
image: grafana/grafana:latest
ports: ["3001:3000"]
environment: { GF_AUTH_ANONYMOUS_ENABLED: "true", GF_AUTH_ANONYMOUS_ORG_ROLE: Admin }
jaeger:
image: jaegertracing/all-in-one:latest
ports: ["16686:16686","4318:4318"]
Create ops/prometheus.yml:
global: { scrape_interval: 5s }
scrape_configs:
- job_name: app
static_configs: [{ targets: ['host.docker.internal:3000'] }]
docker compose up -d
npm install pino pino-http prom-client \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http
Create src/otel.ts and import it first in src/server.ts:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
new NodeSDK({
serviceName: 'url-shortener',
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4318/v1/traces' }),
instrumentations: [getNodeAutoInstrumentations()]
}).start();
// src/server.ts (very first line) import './otel.js'; import 'dotenv/config'; // …
Create src/logger.ts:
import pino from 'pino';
import pinoHttp from 'pino-http';
import { trace } from '@opentelemetry/api';
export const log = pino({
level: process.env.LOG_LEVEL ?? 'info',
formatters: { level: (l) => ({ level: l }) }
});
export const httpLogger = pinoHttp({
logger: log,
customProps: () => {
const span = trace.getActiveSpan()?.spanContext();
return span ? { trace_id: span.traceId, span_id: span.spanId } : {};
}
});
In src/app.ts:
import { httpLogger } from './logger.js';
app.use(httpLogger);
Create src/metrics.ts:
import client from 'prom-client';
client.collectDefaultMetrics();
export const httpDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Request duration',
labelNames: ['method', 'route', 'status'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2]
});
export const cacheHits = new client.Counter({ name: 'cache_hits_total', help: 'Cache hits', labelNames: ['family'] });
export const cacheMisses = new client.Counter({ name: 'cache_misses_total', help: 'Cache misses', labelNames: ['family'] });
export const registry = client.register;
In src/app.ts:
import { httpDuration, registry } from './metrics.js';
app.use((req, res, next) => {
const end = httpDuration.startTimer({ method: req.method });
res.on('finish', () => end({ route: req.route?.path ?? 'other', status: String(res.statusCode) }));
next();
});
app.get('/metrics', async (_req, res) => {
res.set('content-type', registry.contentType);
res.send(await registry.metrics());
});
In src/cache.ts, increment the counters where you decide hit/miss.
npm run dev curl http://localhost:3000/<some-code> curl http://localhost:3000/metrics | grep http_request_duration
http://localhost:9090 → query http_request_duration_seconds_count.http://localhost:16686, pick service url-shortener → see one trace per request.http://localhost:3001; add Prometheus as a data source pointing to http://prometheus:9090.In Grafana, create a dashboard with four panels:
sum(rate(http_request_duration_seconds_count[1m])) by (route)sum(rate(http_request_duration_seconds_count{status=~"5.."}[1m]))histogram_quantile(0.95, sum by (le, route) (rate(http_request_duration_seconds_bucket[5m])))rate(cache_hits_total[1m]) / (rate(cache_hits_total[1m]) + rate(cache_misses_total[1m]))Export the dashboard JSON to ops/grafana-dashboard.json.
Create docs/slo.md:
# SLOs & Error Budget — URL Shortener
## SLI 1 — Redirect availability
good = http_request_duration_seconds_count{route="/:code", status!~"5.."}
total = http_request_duration_seconds_count{route="/:code"}
SLO: 99.95% / 30 days (~21 minutes of budget)
## SLI 2 — Redirect latency
good = histogram p95 of {route="/:code"} < 50ms
SLO: 99% / 30 days
## SLI 3 — Click freshness (analytics)
SLO: 99% of clicks visible in stats within 5 min / 30 days
## Error-budget policy
- > 50% remaining: ship freely
- < 50%: code freeze on risky changes; reliability is next sprint's priority
- exhausted: feature freeze until recovered
## Burn-rate alerts
- Fast burn (page): 14× burn over 1h AND 5min
- Slow burn (ticket): 1× burn over 6h AND 30min
git checkout -b module-07 git add . git commit -m "module 07: pino + prom-client + OTel + grafana + SLO doc" git push -u origin module-07
./otel.js is the very first import (before any HTTP libraries) and the OTLP URL points to :4318/v1/traces.host.docker.internal may not resolve; use extra_hosts mapping or your host IP.@opentelemetry/api is installed and the active span exists at log time (it does, after pino-http runs inside the request).