Core Data Structures · CS Essentials Deep Dive

Quick Facts

The Mental Model

How to Choose

What's the access pattern? Random index, by key, by order, by relationships?
What changes most often? If reads dominate, optimize for them. If writes do, the answer changes.
How big does it get? A bubble-sorted array is fine at 50 items, fatal at 5 million.
Is order meaningful? "Sorted by score" needs a tree or heap. "Most recent" needs a deque or log.
Single-threaded or shared? Concurrent access changes the answer entirely (concurrent maps, lock-free queues).

Linear

Arrays & Lists

Array (Static / Dynamic)

Contiguous block of same-type elements. O(1) index access, O(n) insert/delete in the middle. Dynamic arrays (Java ArrayList, Python list, Go slice, C++ vector, JS array) double their capacity on resize, giving amortized O(1) append.

The default. Cache-friendly, simple, and almost always faster than a linked list at any reasonable size. Reach for something else only when you have a specific reason.

Linked List (Singly / Doubly)

Nodes pointing to next (and optionally previous). O(1) insert/delete given a node, O(n) to find one. Pays a pointer chase per element — terrible cache behavior.

Use when: you splice nodes in/out frequently and already hold the node, or you need a stable iterator under modification. Most "linked list" use-cases in app code should be a deque or array instead.

Stack (LIFO)

Push and pop from the same end. Function calls, undo histories, expression parsing, depth-first traversal. Implement with a dynamic array — O(1) at the top, no allocator pressure.

Queue / Deque (FIFO)

Add at one end, remove from the other. Job pipelines, BFS, producer/consumer buffers. Deque (double-ended queue) lets you do either end — used for sliding windows and work-stealing. Ring buffers are the fast bounded variant.

By Key

Hash Maps & Sets

Hash Map

Average O(1) lookup, insert, delete by key. Worst case O(n) if everything collides — modern implementations (Java, Go, Rust, Python 3.7+) randomize the seed and fall back to a tree on bad buckets to defang adversarial inputs.

No ordering guarantee. Iteration order is implementation-defined or insertion-ordered depending on the language. If you need ordered iteration, use a tree map.

Hash Set

A hash map without values — just keys. The right tool for "have I seen this?" and "is x in the set?" Don't fake it with a list and contains(); that's O(n) per check.

What Makes a Good Hash Function

Deterministic: same input → same hash.
Uniform: spreads inputs across buckets so collisions stay rare.
Equal objects hash equal: override hashCode/__hash__ whenever you override equality. Mismatched contracts are a top-tier bug.
Don't hash mutable fields: if a key's hash changes after insert, you can never find it again.

Ordered

Trees & Heaps

Binary Search Tree (Balanced)

Red-black trees, AVL trees, B-trees. O(log n) insert/lookup/delete, and iteration in sorted order. Use when you need both key-lookup and ordered traversal — range queries, "next-largest," sorted iteration.

Java TreeMap, C++ std::map, .NET SortedDictionary. Databases use B-trees for indexes — same idea, optimized for disk pages.

Heap / Priority Queue

A tree where the parent is always ≤ (or ≥) its children. O(log n) push, O(log n) pop-min, O(1) peek. The right tool for: top-K, scheduling, Dijkstra's algorithm, event simulation.

Trie (Prefix Tree)

Tree keyed by character, one path per word. Autocomplete, IP routing tables, dictionary lookup. Lookup is O(k) where k is key length — independent of how many keys are stored.

Segment Tree / Fenwick Tree

Specialized trees for range queries (sum/min/max over [i, j]) with point updates in O(log n). Show up in competitive programming and time-series rollups.

Connected

Graphs

Representations

Adjacency list: for each node, a list of neighbors. Compact for sparse graphs, the default for most code.
Adjacency matrix: n×n bits. Fast edge-existence check, wastes memory on sparse graphs. Useful when n is small (<1000) and edges are dense.
Edge list: just a list of (u, v) tuples. Simplest; useful for input/output and Kruskal's MST.

Common Algorithms

BFS: shortest path in unweighted graphs, level-by-level. Queue-based.
DFS: reachability, cycle detection, topological sort. Stack/recursion-based.
Dijkstra: shortest path with non-negative weights. Heap + adjacency list.
A*: Dijkstra + heuristic. Pathfinding in games and maps.
Topological sort: ordering of a DAG — task scheduling, build dependencies.
Union-Find (DSU): tracks connected components in near-O(1) amortized. MST, network connectivity.

When You Have a Graph and Don't Know It

Permission inheritance, organizational hierarchies, foreign-key relationships, dependency resolution, social connections, configuration overrides — all graphs. Recognizing the shape is half the work; the algorithm follows.

Specialized

Worth Knowing They Exist

Bloom filter: "definitely not / probably yes." Tiny memory, false positives only. Used by databases to skip disk reads.
HyperLogLog: approximate distinct-count in kilobytes for billions of items. Analytics workhorse.
Skip list: probabilistic alternative to balanced BSTs. Used by Redis sorted sets and LevelDB.
LSM tree: write-optimized storage layer. The reason RocksDB, Cassandra, and modern key-value stores ingest fast.
LRU / LFU cache: bounded map with eviction. Linked hash map (LRU) or counting+heap (LFU).
Rope: tree of strings for O(log n) splits/concatenations. Text editors with millions of lines.

Pitfalls

Common Mistakes

Linear search inside a hot loop. Build a hash set/map once, then look up.
Linked list reflex. "I might insert in the middle" rarely justifies the cache cost. Profile before reaching for one.
Mutating a key after insertion. Hash and tree maps both break — the structure looks for the old position.
Recursive traversal of unbounded trees. Stack overflow at 10k depth. Switch to iterative + explicit stack.
Sorting then re-sorting. If you need ordered access throughout, use a tree or heap, not a list-and-sort.
Picking by familiarity, not fit. "I always use a list" is how O(n²) bugs ship.

Continue

More CS Essentials

← Stack & Heap ← Big-O Notation Concurrency → ↑ Back to Foundations