SQL (Relational) · Database Deep Dive

Quick Facts

At a Glance

Basic Concepts

Relation: a table — rows of the same shape, columns with declared types.
Primary key: one column (or set) that uniquely identifies a row.
Foreign key: a column that points at another table's primary key — the database enforces the link.
Join: combine rows from two tables on a matching column. The reason relational won.
Normalization: split data so each fact lives in exactly one place. Updates stay simple; reads need joins.
ACID: Atomicity, Consistency, Isolation, Durability — the four guarantees that make money-handling possible.

The Big Engines

Pick Your Postgres

PostgreSQL

The modern default. Standards-compliant, extensible (PostGIS, pgvector), excellent JSONB. Use it unless you have a specific reason not to.

MySQL / MariaDB

Ubiquitous, fast, simple. Behind much of the LAMP-era web. Watch defaults — older versions had loose ones.

SQL Server

Microsoft's enterprise stack. Tight Azure / .NET integration, strong tooling, T-SQL dialect.

Oracle

Long-standing enterprise heavyweight. Powerful, expensive, and entrenched in finance & telco.

SQLite

An entire database in a single file linked into your app. Phones, browsers, edge — and increasingly, small servers.

CockroachDB / Spanner / Yugabyte

Distributed SQL — relational semantics across regions. Pay for it in latency.

When SQL Wins

What Relational is Good At

Invariants You Can't Negotiate

Money never disappears, an order has a customer, a seat isn't sold twice. Foreign keys, CHECK constraints, unique indexes, and serializable transactions enforce these in the database — not hopefully in the app. If two services touch the same data, that enforcement is the only one that holds.

Ad-hoc Queries You Didn't Anticipate

SQL lets you ask new questions of old data: "revenue by cohort by month, but only for users who churned." No re-modeling, no rebuild — write the query. NoSQL stores typically answer fast only the questions you designed for; SQL answers most questions slowly and the right ones quickly (with the right index).

Joins Across Entities

Orders → customers → addresses → countries. Done in one round-trip with a JOIN. The alternative — fetch, then fetch, then fetch — is the N+1 problem in disguise, and what you end up reinventing in document databases.

Reporting & BI

Every analyst, BI tool, and dashboard speaks SQL. Choosing relational means your data is queryable by people who don't write code — a multiplier most teams underestimate.

Design

Modeling Relational Data

Normal Forms — The Pragmatic Take

3NF is usually the goal. Each non-key column depends on the key, the whole key, and nothing but the key. Higher forms (BCNF, 4NF) matter rarely; lower forms cause update anomalies daily.

Denormalize on purpose — for read-heavy paths, or to avoid joins on enormous tables. Document the duplication so the next person doesn't trust it as authoritative.

Keys & IDs

Auto-increment integers: compact, fast, but leak volume and enable enumeration. Don't put them in URLs.
UUIDs: generated anywhere, no central coordination. Random UUIDs hurt B-tree locality — prefer UUIDv7 or ULID for time-ordered inserts.
Composite keys: natural, but painful to reference. A surrogate key with a unique constraint on the natural columns is usually cleaner.

JSON Columns — The Escape Hatch

Postgres JSONB, MySQL JSON, SQL Server JSON: store semi-structured data in a relational column when the shape is genuinely variable (settings blobs, third-party payloads). Index specific paths with expression indexes. Don't use it as an excuse to skip schema design — once you're querying inside the JSON, you're worse off than if you'd just made columns.

Polymorphism & Inheritance

Three options: single table with nullable columns (simple, sparse), table per type (clean, joins to read), shared table + type-specific tables (most flexible, most code). Pick by query patterns — most apps overengineer this.

Performance

Where SQL Slows Down

Indexes — the Right Ones

Index for the queries you run, not the columns you have.
Composite indexes are order-sensitive — (tenant_id, created_at) ≠ (created_at, tenant_id).
Covering indexes (include all columns the query needs) skip the table read entirely.
Every index slows writes and consumes space. Audit; remove the dead ones.
WHERE lower(email) = … defeats a normal index — use a functional index or store normalized.

Read EXPLAIN Until You Can

Sequential scan on a million-row table is the warning sign. Nested loop with millions of iterations is another. Most slow queries have one of three causes: missing index, bad statistics, or a query that fundamentally needs to read too much data.

Connection Pooling

Postgres wants fewer connections than developers expect — usually 2–4× CPU cores on the DB server. Put pgBouncer or RDS Proxy in front so thousands of app connections share a small real pool. App-side pool size should account for instance count: app_pool × instances ≤ db_max.

Read Replicas & Sharding

Read replicas scale reads cheaply; mind replication lag — don't read your own write from a replica. Sharding (split by key) is the last resort: it gives up cross-shard joins and transactions. Consider distributed SQL (Spanner, CockroachDB) before sharding manually.

Operations

Schema Evolution & Backups

Migrations are code. Check them in. Run them automatically. Make them backwards-compatible with the previous app version (expand → migrate → contract) so deploys can roll back.
DDL on hot tables can lock for a long time. Postgres: CREATE INDEX CONCURRENTLY; add columns without defaults, then backfill in batches.
Backups & PITR: nightly snapshots plus continuous WAL archiving. Test restores — an untested backup is a wish, not a guarantee.
Tools: Flyway, Liquibase (Java); Alembic (Python); Prisma Migrate, Drizzle Kit, Knex (Node); EF Core Migrations (.NET); Rails migrations (Ruby).

Pitfalls

Common Mistakes

String concatenation into SQL — the original sin. Always parameterize.
Long-running transactions hold locks and bloat the WAL. Keep them short; never await user input inside one.
Soft deletes everywhere — every query needs WHERE deleted_at IS NULL forever. Use them deliberately.
Naive timestamps. Store timestamptz in UTC; convert at the edge.
Premature sharding. A well-tuned single Postgres goes a long way before sharding pays for itself.
Treating the ORM as the schema. The database has its own truth — keep migrations the source of authority, not generated artifacts.

Continue

The Other Side

NoSQL Deep Dive → Persistence (Server) DBaaS ↑ Back to Database Side