Database Deep Dive

Schema Design — The Shape of Your Data

A schema is a contract: every row of this table looks like this, every relationship means that. Get it right and reads are fast, invariants hold, and changes stay cheap. Get it wrong and you'll fight the same bugs for years.

KeysNormalizationIndexesConstraintsMigrations
← Back to Database Side
Quick Facts

At a Glance

Basic Concepts

  • Entity: a thing your system tracks — a user, an order, a link.
  • Attribute: a property of an entity — email, status, created_at.
  • Relationship: how entities connect — one-to-one, one-to-many, many-to-many.
  • Key: a column (or set) that identifies a row uniquely (primary) or points to another (foreign).
  • Constraint: a rule the database enforces — NOT NULL, UNIQUE, CHECK, foreign key.
  • Index: an extra data structure that makes some queries fast at the cost of writes.
  • Migration: a versioned, reproducible change to the schema.
Step One

Model the Domain Before the Tables

Start with nouns and verbs. List the entities (User, Link, Click), the relationships (a User owns many Links; a Link has many Clicks), and the operations (create a Link, resolve a code, aggregate Clicks per day). The schema is a translation of that model — not the other way around.

If you can't describe what each table represents in one short sentence, the model isn't ready yet.

Keys

Picking a Primary Key

Key styleProsCons
Auto-increment integerTiny, fast, simple.Leaks volume; awkward across shards; guessable in URLs.
UUID v4 (random)Unique anywhere, generate client-side.Bigger; random inserts hurt B-tree locality.
UUID v7 / ULIDTime-ordered + globally unique. Best of both.Newer; not all drivers support natively.
Natural key (email, ISBN)Meaningful; avoids one extra join.Mutable in practice. Avoid as primary key.

Rule of thumb: synthetic keys for primary, natural keys for UNIQUE indexes. Never let a primary key carry meaning that might change.

Normalization

One Fact, One Place

The three forms in plain English
  • 1NF: no repeating groups, no arrays-in-a-cell. Every cell is atomic.
  • 2NF: every non-key column depends on the whole primary key (matters for composite keys).
  • 3NF: non-key columns depend only on the key — not on other non-key columns.

Practically: if you find yourself updating the same fact in two rows, it's not in 3NF.

When to denormalize

Denormalize when a join is on the read path of your hottest query and the data being duplicated changes rarely. Examples: storing order.customer_name as it was at purchase time, caching post.comment_count, materializing a daily rollup.

The cost is write complexity: every change must update every copy. Pay it deliberately, not by accident.

Constraints

Let the Database Defend the Truth

  • NOT NULL on every column that must have a value. NULL means "unknown" — only use it when that's actually what you mean.
  • UNIQUE on natural identifiers (email, slug, short code).
  • Foreign keys with the right ON DELETE behavior — CASCADE, SET NULL, or RESTRICT. Pick one consciously.
  • CHECK constraints for invariants — price > 0, status IN ('active','expired').
  • Generated columns for derived values you query on (e.g., a normalized lowercase email).

Application code is one of many writers. Constraints in the database are the only ones that hold no matter who is writing.

Indexes

Fast Reads Aren't Free

  • Index for the queries you actually run. Look at EXPLAIN plans, not your intuition.
  • Composite indexes are ordered — (user_id, created_at) serves WHERE user_id = ? and WHERE user_id = ? ORDER BY created_at, but not WHERE created_at > ? alone.
  • Every index slows writes. Indexes also take disk and RAM. More is not better.
  • Partial indexes for sparse predicates: CREATE INDEX … WHERE deleted_at IS NULL.
  • Covering indexes include all columns the query needs, so the table doesn't need to be touched at all.
Naming

Conventions That Save Future-You

  • Tables: plural, snake_case (users, link_clicks). Or singular — pick one, stay consistent.
  • Columns: snake_case, no abbreviations unless universal (url, id).
  • Foreign keys: <referenced_table_singular>_id (user_id, link_id).
  • Booleans: positive phrasing — is_active, not is_not_disabled.
  • Timestamps: created_at, updated_at, deleted_at. Always store UTC; convert at the edge.
  • Indexes: idx_<table>_<cols>. Constraints: chk_, uq_, fk_ prefixes.
Evolution

Schemas Are Living Things — Migrations

Production schemas change. The question isn't whether, but how safely.

  • Use a migration tool. Flyway, Liquibase, Alembic, Prisma Migrate, Knex, Rails, Django — pick one and keep migrations in version control.
  • Forward-only in production. "Down" migrations are nice for dev, dangerous for prod. Roll forward with a new migration.
  • Expand → migrate → contract. Add the new column nullable, dual-write, backfill, switch reads, drop the old. Never break running code.
  • Avoid long-locking changes. Adding a non-null column with a default rewrites the table on many engines. Use multi-step migrations on big tables.
  • Test migrations against a copy of prod-shaped data — not just an empty schema.
Common Pitfalls

Mistakes You Only Make Once

  • Using VARCHAR for everything. Reach for the right type — TIMESTAMPTZ, UUID, NUMERIC, JSONB, BOOLEAN, enums.
  • Money in FLOAT. Use NUMERIC(precision, scale) or store integer minor units.
  • Soft deletes everywhere. Useful sometimes, but every query now needs WHERE deleted_at IS NULL. Forget once and data leaks.
  • Storing JSON blobs to "stay flexible." Fine for genuinely unstructured payloads; a tar pit for fields you'll query, index, or migrate.
  • No timestamps. Add created_at and updated_at on every table. You'll thank yourself during your first incident.
  • Single tenants table. If you're multi-tenant, decide between row-level (tenant_id everywhere) or schema-per-tenant before you ship.
  • Cyclic foreign keys. Tempting, painful at insert time. Restructure or use deferrable constraints.
Worked Example

The URL Shortener Schema

A reasonable starting point — small, normalized, with the right constraints:

CREATE TABLE users (
  id           BIGSERIAL PRIMARY KEY,
  email        TEXT NOT NULL,
  password_hash TEXT NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_users_email UNIQUE (email)
);

CREATE TABLE links (
  id          BIGSERIAL PRIMARY KEY,
  code        TEXT NOT NULL,
  target_url  TEXT NOT NULL,
  owner_id    BIGINT REFERENCES users(id) ON DELETE SET NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at  TIMESTAMPTZ,
  deleted_at  TIMESTAMPTZ,
  CONSTRAINT uq_links_code UNIQUE (code),
  CONSTRAINT chk_links_url CHECK (target_url ~* '^https?://')
);

CREATE INDEX idx_links_owner_active
  ON links (owner_id) WHERE deleted_at IS NULL;

CREATE TABLE clicks (
  id         BIGSERIAL PRIMARY KEY,
  link_id    BIGINT NOT NULL REFERENCES links(id) ON DELETE CASCADE,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  referrer   TEXT,
  user_agent TEXT,
  country    TEXT
);

CREATE INDEX idx_clicks_link_time
  ON clicks (link_id, occurred_at DESC);

Notice: every table has timestamps; every UNIQUE business field has a constraint; the partial index on active links keeps it small; the clicks index is ordered for the most common query (recent clicks for a link).

Continue

Related Reading