Graph Databases · Database Family Deep Dive

Quick Facts

At a Glance

Core Ideas

Node: an entity (person, account, product), with labels and properties.
Edge / relationship: a connection between two nodes — typed, directed, with its own properties.
Index-free adjacency: each node holds direct pointers to its edges. Following one is a pointer chase, not an index lookup.
Traversal: walking the graph from a starting node to find what you want — neighbors, paths, patterns.
Property graph vs RDF: the property-graph model (Neo4j, Neptune, TigerGraph) dominates apps; RDF / triple stores (Jena, GraphDB) dominate semantic web and knowledge bases.

The Engines

Who Plays Here

Neo4j

The category leader. Property graph, Cypher query language (now an open standard, GQL). Excellent visualization, mature operations.

Amazon Neptune

Managed graph on AWS. Speaks both Gremlin (property graph) and SPARQL (RDF). Pairs well with the rest of the AWS data stack.

ArangoDB

Multi-model — documents, graph, and key-value in one engine. Useful when you don't want to operate two databases.

TigerGraph / Memgraph

Performance-focused; in-memory or analytics-grade traversal speed for fraud, supply chain, real-time scoring.

JanusGraph / Dgraph

Distributed graph. JanusGraph rides on Cassandra/HBase; Dgraph is its own thing with GraphQL native.

AGE / Apache AGE

Postgres extension that adds Cypher to a relational DB. Hedge bet — graph queries without leaving Postgres.

When Graph Wins

The Use Cases That Justify It

Many-Hop Relationship Queries

"Friends of friends." "Customers who share a device with a flagged account." "Products bought by people who bought this product." In SQL, each hop is a self-join — three hops is already painful, six is impossible. In a graph DB, each hop is O(neighbors), and you can express the whole thing in one Cypher pattern.

Fraud & Anti-Money-Laundering

Detect rings: shared addresses, devices, payment instruments, IPs. The fraud pattern is a subgraph — graph DBs match it directly with pattern queries. A SQL system would precompute features; a graph can ask the question live.

Recommendations & Knowledge Graphs

"People like you also liked X." Walk from the user, through purchases, to other users, to their purchases — score by overlap. Knowledge graphs (entities, relations, types) power assistants, search, and increasingly, RAG context for LLMs.

Identity & Access (Permissions)

"Does user U have permission P on resource R?" through groups, roles, and inherited grants. Google's Zanzibar model is a graph at heart. SpiceDB, OpenFGA — graph-shaped under the hood.

Network & Supply-Chain Analysis

Telecom topology, logistics routes, dependency graphs, infrastructure blast radius. Shortest path, centrality, community detection — algorithms that run natively on the graph instead of being shoehorned through SQL.

Querying

The Languages

Cypher / GQL

ASCII-art pattern matching: (user)-[:FRIEND]->(friend)-[:FRIEND]->(fof). Reads close to how you'd draw it on a whiteboard. Cypher started at Neo4j; GQL is the ISO standard descendant.

Gremlin

Functional traversal language: g.V(user).out('friend').out('friend'). Imperative-feeling, more flexible but harder to read. The native language of the Apache TinkerPop ecosystem.

SPARQL

For RDF / triple stores. Subject-predicate-object queries against a knowledge graph. Wikidata, DBpedia, biomedical ontologies — that world.

GraphQL ≠ Graph DB

Common confusion: GraphQL is an API query language; graph databases store and traverse graphs. They're orthogonal. You can serve GraphQL from a relational DB, and many do.

Pitfalls

Common Mistakes

Picking graph for tabular data. If your queries are mostly "all rows where X," a relational DB is faster, cheaper, and your team already knows it.
Supernodes. A celebrity user with 50M followers — a single node with millions of edges. Traversals through it explode. Mitigate with edge sampling, summary nodes, or partitioning by relationship type.
Treating it as the system of record for everything. Often the right shape: relational source of truth + graph projection for the relationship-heavy queries, kept in sync via CDC.
Skipping indexes on lookup properties. Index-free adjacency speeds traversals once you've found the start node — finding it is still an index lookup.
Underestimating the operational lift. Graph DB ops, backups, and ecosystem are thinner than Postgres. Plan for it.

Continue