Observability & Performance Deep Dive · 1 of 7

Log Aggregation — ELK, OpenSearch & Loki

Logs are the text trail every service leaves behind. Aggregating them into one searchable place is the floor of observability — without it you SSH into ten boxes during an incident. The Elastic Stack popularized the pattern; OpenSearch is the AWS-led OSS fork; Loki took a different path with index-the-labels, scan-the-logs.

ElasticOpenSearchLokiFluent BitVectorStructured logs
← Back to Observability & Performance
Anatomy

The Pipeline

App emits log
Agent (Fluent Bit / Filebeat / Vector)
Buffer (Kafka / Redis)
Indexer (Elastic / OpenSearch / Loki)
Query UI (Kibana / OpenSearch Dashboards / Grafana)
Side-By-Side

The Three Stacks

StackStrengthWatch out for
ELK (Elasticsearch + Logstash + Kibana)Mature, full-text search at scale, rich aggregations, alerts, ML add-ons.License pivot to ELv2/SSPL in 2021; cluster ops are real work.
OpenSearchApache 2.0 OSS fork led by AWS; managed offering on AWS; mostly drop-in for ELK.Versions diverging slowly from Elastic; some plugins differ.
Grafana LokiIndexes only labels (cheap); stores chunks in S3-class object storage. Tight Grafana integration.No full-text index — slow for needle-in-haystack searches without good labels.
Hard-Won Habits

What Separates "Logs" from "Useful Logs"

  • Structured (JSON) logs. Lines like {"ts":..., "level":"error", "trace_id":...} beat free-text every time.
  • Correlation IDs / trace IDs on every log line — you'll thank yourself when you need to follow a request.
  • Sampling at high volume. Sample successful requests, keep all errors.
  • Sensitive-data filters at the agent — never log raw tokens, PII, full payloads.
  • Retention tiers. 7 days hot, 30 days warm, 90 days cold/object-store. Costs balloon if everything is hot.
  • Don't log inside hot loops. A stray log.info in a 100k req/s path can dominate your bill.
Tradeoffs

What to Watch Out For

  • Logs are expensive. Index, store, and (in many platforms) ingest are all metered.
  • Search-everything sounds great until your bill or cluster crashes.
  • Self-hosting Elasticsearch at scale is a full-time platform-team responsibility.
  • "Just log it" is not observability. Pair with metrics + traces; don't make logs do all three.
  • Audit logs are different. Compliance retention & tamper-evidence belong on a separate path.
Continue

Other Observability Tools