CS Essentials Deep Dive · 3 of 4

Core Data Structures — Pick the Right Shape

Most programs are a thin layer of business logic on top of a handful of data structures. Choosing the wrong one isn't a "performance problem" — it's a design problem that shows up later as one. This page is the short version of when to use each.

ArraysHash MapsTreesGraphsQueues
← Back to Foundations
Quick Facts

The Mental Model

How to Choose

  • What's the access pattern? Random index, by key, by order, by relationships?
  • What changes most often? If reads dominate, optimize for them. If writes do, the answer changes.
  • How big does it get? A bubble-sorted array is fine at 50 items, fatal at 5 million.
  • Is order meaningful? "Sorted by score" needs a tree or heap. "Most recent" needs a deque or log.
  • Single-threaded or shared? Concurrent access changes the answer entirely (concurrent maps, lock-free queues).
Linear

Arrays & Lists

Array (Static / Dynamic)

Contiguous block of same-type elements. O(1) index access, O(n) insert/delete in the middle. Dynamic arrays (Java ArrayList, Python list, Go slice, C++ vector, JS array) double their capacity on resize, giving amortized O(1) append.

The default. Cache-friendly, simple, and almost always faster than a linked list at any reasonable size. Reach for something else only when you have a specific reason.

Linked List (Singly / Doubly)

Nodes pointing to next (and optionally previous). O(1) insert/delete given a node, O(n) to find one. Pays a pointer chase per element — terrible cache behavior.

Use when: you splice nodes in/out frequently and already hold the node, or you need a stable iterator under modification. Most "linked list" use-cases in app code should be a deque or array instead.

Stack (LIFO)

Push and pop from the same end. Function calls, undo histories, expression parsing, depth-first traversal. Implement with a dynamic array — O(1) at the top, no allocator pressure.

Queue / Deque (FIFO)

Add at one end, remove from the other. Job pipelines, BFS, producer/consumer buffers. Deque (double-ended queue) lets you do either end — used for sliding windows and work-stealing. Ring buffers are the fast bounded variant.

By Key

Hash Maps & Sets

Hash Map

Average O(1) lookup, insert, delete by key. Worst case O(n) if everything collides — modern implementations (Java, Go, Rust, Python 3.7+) randomize the seed and fall back to a tree on bad buckets to defang adversarial inputs.

No ordering guarantee. Iteration order is implementation-defined or insertion-ordered depending on the language. If you need ordered iteration, use a tree map.

Hash Set

A hash map without values — just keys. The right tool for "have I seen this?" and "is x in the set?" Don't fake it with a list and contains(); that's O(n) per check.

What Makes a Good Hash Function
  • Deterministic: same input → same hash.
  • Uniform: spreads inputs across buckets so collisions stay rare.
  • Equal objects hash equal: override hashCode/__hash__ whenever you override equality. Mismatched contracts are a top-tier bug.
  • Don't hash mutable fields: if a key's hash changes after insert, you can never find it again.
Ordered

Trees & Heaps

Binary Search Tree (Balanced)

Red-black trees, AVL trees, B-trees. O(log n) insert/lookup/delete, and iteration in sorted order. Use when you need both key-lookup and ordered traversal — range queries, "next-largest," sorted iteration.

Java TreeMap, C++ std::map, .NET SortedDictionary. Databases use B-trees for indexes — same idea, optimized for disk pages.

Heap / Priority Queue

A tree where the parent is always ≤ (or ≥) its children. O(log n) push, O(log n) pop-min, O(1) peek. The right tool for: top-K, scheduling, Dijkstra's algorithm, event simulation.

Trie (Prefix Tree)

Tree keyed by character, one path per word. Autocomplete, IP routing tables, dictionary lookup. Lookup is O(k) where k is key length — independent of how many keys are stored.

Segment Tree / Fenwick Tree

Specialized trees for range queries (sum/min/max over [i, j]) with point updates in O(log n). Show up in competitive programming and time-series rollups.

Connected

Graphs

Representations
  • Adjacency list: for each node, a list of neighbors. Compact for sparse graphs, the default for most code.
  • Adjacency matrix: n×n bits. Fast edge-existence check, wastes memory on sparse graphs. Useful when n is small (<1000) and edges are dense.
  • Edge list: just a list of (u, v) tuples. Simplest; useful for input/output and Kruskal's MST.
Common Algorithms
  • BFS: shortest path in unweighted graphs, level-by-level. Queue-based.
  • DFS: reachability, cycle detection, topological sort. Stack/recursion-based.
  • Dijkstra: shortest path with non-negative weights. Heap + adjacency list.
  • A*: Dijkstra + heuristic. Pathfinding in games and maps.
  • Topological sort: ordering of a DAG — task scheduling, build dependencies.
  • Union-Find (DSU): tracks connected components in near-O(1) amortized. MST, network connectivity.
When You Have a Graph and Don't Know It

Permission inheritance, organizational hierarchies, foreign-key relationships, dependency resolution, social connections, configuration overrides — all graphs. Recognizing the shape is half the work; the algorithm follows.

Specialized

Worth Knowing They Exist

  • Bloom filter: "definitely not / probably yes." Tiny memory, false positives only. Used by databases to skip disk reads.
  • HyperLogLog: approximate distinct-count in kilobytes for billions of items. Analytics workhorse.
  • Skip list: probabilistic alternative to balanced BSTs. Used by Redis sorted sets and LevelDB.
  • LSM tree: write-optimized storage layer. The reason RocksDB, Cassandra, and modern key-value stores ingest fast.
  • LRU / LFU cache: bounded map with eviction. Linked hash map (LRU) or counting+heap (LFU).
  • Rope: tree of strings for O(log n) splits/concatenations. Text editors with millions of lines.
Pitfalls

Common Mistakes

  • Linear search inside a hot loop. Build a hash set/map once, then look up.
  • Linked list reflex. "I might insert in the middle" rarely justifies the cache cost. Profile before reaching for one.
  • Mutating a key after insertion. Hash and tree maps both break — the structure looks for the old position.
  • Recursive traversal of unbounded trees. Stack overflow at 10k depth. Switch to iterative + explicit stack.
  • Sorting then re-sorting. If you need ordered access throughout, use a tree or heap, not a list-and-sort.
  • Picking by familiarity, not fit. "I always use a list" is how O(n²) bugs ship.
Continue

More CS Essentials