Week 12: Advanced Graphs

Pattern family: Graphs · Time budget: 20 to 22 hours · Prerequisites: Weeks 1 through 11 done, with graph traversal and the mandatory visited set (Week 11), heapq (Week 8), and float('inf') (Week 5) fresh. · Cold-revisit obligation: Word Search II, Minimum Window Substring.

Overview

Last week the edges were free: every step cost the same, so breadth-first search found the fewest-steps path and depth-first search found connectivity. This week the edges have weights, and “fewest edges” stops meaning “cheapest”. The whole week is about shortest paths and spanning structure when the cost of moving is not uniform, plus one specialty algorithm (Eulerian paths) that does not fit the shortest-path mold at all.

This is the heaviest single week in the course, and the reason is that there is no one algorithm. There is a small decision you make first, every time, and it routes you to the right tool. Are the weights non-negative? Dijkstra, with a heap, greedily settling the closest unsettled node. Can a weight be negative? Dijkstra’s greedy assumption breaks, and you fall back to Bellman-Ford, which relaxes every edge a bounded number of times. Is there an extra constraint baked into the problem, like “at most k stops”? Then the node alone is no longer the state, and you expand the state to carry the constraint. Most of this week is training the move that fires before any code: read the problem, find the weights, and route yourself to the algorithm the weights demand.

flowchart TD
    Q["Read the problem"] --> A{"Are the edges weighted,<br/>and is the question<br/>'cheapest path' or<br/>'minimum total cost'?"}
    A -->|no, unit edges| G["Plain BFS/DFS from Week 11.<br/>Fewest edges, connectivity,<br/>cycles. Keep reading the cues."]
    A -->|yes| B{"Can any weight<br/>be negative?"}
    B -->|"no, all non-negative"| C{"Is there an extra<br/>constraint, like<br/>'at most k stops'?"}
    B -->|"yes, negatives exist"| BF["Bellman-Ford: relax every<br/>edge a bounded number<br/>of rounds. Dijkstra is unsafe."]
    C -->|"no"| DJ["Dijkstra with a heap:<br/>settle the closest<br/>unsettled node, O(E log V)."]
    C -->|"yes"| SE["State expansion: the state<br/>is (node, constraint), not<br/>node alone. Then Dijkstra<br/>or Bellman-Ford over it."]

Notice: the decision happens before you code, and it is a routing decision on the weights. Non-negative weights -> Dijkstra. A possible negative weight -> Bellman-Ford. A side constraint like a stop limit -> expand the state. Naming which case you are in out loud (principle 1) is what makes this diagram fire under pressure.

Warm-Up: Retrieve Before You Begin

Answer from memory, no peeking. This pulls forward the prior-week tools that advanced graphs lean on hardest.

From Week 8 and the refresher: Python’s heapq is a min-heap only. If you push tuples (cost, node) onto it, in what order do they come off, and what is the cost of one heappush or heappop?
From Week 5 and the refresher: what does float('inf') give you, and why is it the natural starting value for a “shortest distance so far” table?
From Week 11: how do you build an adjacency list from an edge list with collections.defaultdict, and why is a visited set mandatory in graph traversal?

Check your recall

1. A min-heap always hands back the smallest element first, and tuples compare lexicographically, so a heap of `(cost, node)` pops the entry with the smallest `cost` (ties broken by `node`). Each `heappush` and `heappop` is O(log n) in the heap size. This is exactly the machinery Dijkstra rides on: "give me the closest unsettled node next" is one `heappop`. One caveat returns this week: if two costs tie and the second tuple element is not comparable, the comparison raises, so you keep the second element a plain integer node id (or add a counter tiebreaker). 2. `float('inf')` is positive infinity, a float larger than any finite number you will compare against. For a shortest-distance table you initialize every node to `float('inf')` ("not yet reached") and the source to `0`; then `dist[v] = min(dist[v], dist[u] + w)` correctly replaces infinity the first time any real path arrives. A node still at `float('inf')` at the end is unreachable. 3. You write `g = defaultdict(list)` and, for each edge, `g[u].append(v)` (and `g[v].append(u)` if undirected); a missing key auto-creates an empty list, so there is no `KeyError` guard. For weighted graphs this week you append a pair, `g[u].append((v, w))`. The `visited` set is mandatory because graphs have cycles: without marking nodes done, a traversal revisits them forever and never terminates.

Learning objectives

By the end of this week you can:

Recognize weighted-shortest-path and minimum-spanning cues and route yourself to the right algorithm without being told which one, including the all-pairs-shortest-path cue that names Floyd-Warshall (recognize it; you are not asked to construct it from memory).
Distinguish when Dijkstra is safe (non-negative weights) from when it is not (any negative weight), and state precisely why the greedy step breaks.
Construct, from memory, the three required-core algorithms of the week: a heap-based Dijkstra over an adjacency list, a Bellman-Ford relaxation over an edge list, and a Union-Find structure (find with path compression, union by rank or size). Prim’s MST is reached only in the optional stretch problems: recognize its cue, and construct it only if you take that stretch. Hierholzer’s stays recognize-only (like Floyd-Warshall): know the use-every-edge cue and what it does, but you are not asked to construct it from memory.
Define the state of a constrained shortest-path problem as (node, constraint) and explain why a plain visited set on nodes alone gives a wrong answer under a stop limit.
Analyze the time and space complexity of each algorithm from its structure (heap operations, relaxation rounds, vertices and edges) and say what dominates at scale.
Defend every line of each solution you commit, from memory, with no AI open, including why your chosen algorithm is correct for the weights in the problem.

Recognition cue

“Weighted edges, and the question is the cheapest path or the minimum total cost” almost always means an advanced-graph algorithm rather than the plain BFS/DFS of Week 11. The surface phrases are “shortest path” where steps have different costs, “minimum time for a signal to reach every node”, “minimum cost to connect everything”, “cheapest route”, and constraint-carrying phrases like “with at most k stops” or “within k moves”. The deeper tell is that fewest-edges and cheapest-cost have come apart: a path with more edges can be cheaper, so counting hops (what BFS does) no longer answers the question, and you need an algorithm that compares accumulated cost.

Core concepts to internalize this week

Dijkstra with a heap. Keep a min-heap of (distance, node). Repeatedly pop the closest unsettled node, finalize its distance, and relax its outgoing edges (push improved distances for its neighbors). The first time you pop a node, its distance is final. This is a greedy algorithm, and its correctness rests entirely on weights being non-negative.
Why non-negative weights matter (the proof in one line). When you pop a node as the closest unsettled one, you are betting that no later, cheaper path to it exists. That bet is safe only if every remaining edge can only add cost. A single negative edge could make a longer detour arrive cheaper after you have already finalized the node, so the bet fails and Dijkstra returns a wrong answer.
Bellman-Ford. When negative weights are possible, drop the greedy step. Instead, relax every edge, and repeat. After i rounds of relaxing all edges, every shortest path that uses at most i edges is correct. With V nodes a shortest path uses at most V - 1 edges, so V - 1 rounds settle everything (a V-th round that still improves something signals a negative cycle). Bellman-Ford is slower than Dijkstra but it tolerates negatives, and its round count is itself a useful dial, which is the hook for this week’s canonical problem.
Floyd-Warshall (named, for completeness). When you need the shortest path between every pair of nodes, Floyd-Warshall does it in O(V^3) with a triple loop over an intermediate node k, an origin i, and a destination j: dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). You will not need it for this week’s set, but recognize the all-pairs cue.
Minimum spanning tree (Prim’s). A different question from shortest path: connect every node into one tree at minimum total edge cost, with no concern for path length between specific nodes. Prim’s grows the tree one node at a time, always adding the cheapest edge that reaches a node not yet in the tree (a heap, or on a dense graph a simple scan, finds it). On a dense graph where every pair has an edge (points on a plane), the O(V^2) scan version is clean and competitive.
Union-Find (disjoint-set union). A tiny structure that answers one question fast: are these two nodes already in the same group, and merge two groups into one. You keep a parent array where parent[x] points one step toward the representative (“root”) of x’s group; a node is its own parent exactly when it is a root. find(x) walks parents up to the root, and path compression repoints the nodes it passed straight at the root so the next find is shorter. union(a, b) finds both roots and links one under the other; union by rank (or by size) always hangs the shorter tree under the taller one, keeping the trees flat. With both optimizations together, find and union cost near-O(1) amortized: the formal bound is the inverse-Ackermann function, which is at most about 4 for any input you will ever see. The cue is connectivity and grouping over a stream of edges, “are these two already connected”, “how many components remain”, or “which edge closes the first cycle”.
Kruskal’s (the DSU-based MST). Union-Find gives you a second way to build a minimum spanning tree, an alternative to Prim’s: sort all edges by weight and walk them cheapest-first, and union the endpoints of each edge whose endpoints are not already connected (skip an edge that would close a cycle, which find detects). Stop after you have added V - 1 edges. Prim’s grows one tree outward from a node; Kruskal’s grows a forest of fragments and merges them. On the sparse, edge-list graphs interviews usually hand you, Kruskal’s sort-then-union is often the cleaner of the two.
State expansion. When a problem adds a constraint that a path must satisfy (a budget of stops, a parity, a set of keys collected), the node is no longer enough to describe where you are. You expand the state to a tuple that carries the constraint, for example (node, stops_used), and run a standard algorithm over that larger state graph. This is the load-bearing idea of the week and the home of the canonical problem.
Hierholzer’s algorithm (Eulerian path). A specialty tool, not a shortest-path method. When a problem asks you to use every edge exactly once (walk all the tickets, trace the whole route), you are looking for an Eulerian path, and Hierholzer’s builds it: walk edges, deleting each as you use it, until you get stuck, then splice in detours. Reconstruct Itinerary is this, with a lexical-order twist.

Common misconception. “Dijkstra finds the shortest path on any weighted graph.” Reality. Dijkstra is correct only when every edge weight is non-negative. Its core step finalizes the closest unsettled node and never reconsiders it; a negative edge can make a later path to that node cheaper, after it is already finalized, which silently corrupts the answer. The moment a problem allows negative weights, Dijkstra is the wrong tool and Bellman-Ford (or another negative-aware method) is the right one. State this trade out loud when you justify the choice; interviewers probe exactly here.

Common misconception. “A visited set always prevents wasted work, so a constrained shortest-path problem is just Dijkstra with a visited set on the nodes.” Reality. Under a constraint like “at most k stops”, a plain node visited set is not just wasteful, it is wrong. The first time you reach a node it may be by the cheapest route but one that has already burned too many stops, and marking the node done blocks a costlier-but-fewer-stops arrival that the constraint actually needs later. The fix is not to drop the visited check but to expand what “visited” keys on: the state is (node, stops_used), and only the exact same (node, stops_used) pair (reached at equal or higher cost) is truly redundant. Getting this wrong is the single most common way constrained-path solutions return a plausible but incorrect number.

Common misconception. “Path compression makes Union-Find near-constant time, so that one optimization is the part that matters.” Reality. The near-O(1) inverse-Ackermann bound needs both optimizations working together: path compression on find and union by rank or size on union. With only one of the two, the bound is O(log V) per operation, not near-constant; the trees can grow tall enough to cost a real logarithm. Naive union without ranking can even build a single long chain that degrades find toward O(V). State the pair together when you justify the cost: “path compression and union by rank, so amortized near-constant”; claiming it for path compression alone is the slip an interviewer will catch.

Heavy concept ahead. State expansion is the idea that makes this week click, so internalize it deliberately. The trap is to assume “the node I am standing on” fully describes my situation. It does not, the moment the problem attaches a constraint to the path. Walk it in this order, every time:

Find the constraint. What does the path have to satisfy beyond reaching the destination? A stop budget, a parity, keys collected, fuel left.

Fold it into the state. The state becomes a tuple: (node, stops_used), (node, parity), (node, keys_frozenset). Two arrivals at the same node with different constraint values are different states and must be tracked separately.

Run a standard algorithm over the expanded state graph. Dijkstra or Bellman-Ford does not change; it just operates on tuples instead of bare nodes. Your dist table and any visited set key on the full tuple.

Read the answer off the right slice. The answer is usually the best cost over all states (destination, *) whose constraint is within budget. If you skip step one and key everything on the node alone, you get a fast, clean, wrong answer. The canonical deep-dive walks this on Cheapest Flights, where the state is (node, stops_used); the debrief checks that you can say why the node alone is not the state.

Python idioms this week

You are not yet expected to be fluent, so here is the toolbox this week uses. If any is unfamiliar, open the Python refresher before Monday.

heapq.heappush(pq, (cost, node)) and cost, node = heapq.heappop(pq): a min-heap of tuples is your Dijkstra priority queue, ordered by cost (see the refresher’s heapq section, including the tie-breaker pitfall).
Tuples as composite heap and state entries: (cost, node) for a plain heap, (cost, node, stops_used) once you expand the state. Tuples are hashable, so an expanded state can also be a dict key or set member.
float('inf') to initialize a distance table or a “best cost to this state” table; dist[v] = min(dist[v], new_cost) to relax.
Weighted adjacency with collections.defaultdict(list): g[u].append((v, w)). (Adjacency-list construction is from Week 11.)
A visited set, but keyed on the full state, not just the node, whenever the state is expanded.
collections.deque for any BFS layer you still need, and list/tuple unpacking for reading edges for u, v, w in edges.

These are tools you are allowed to know cold in an interview. The pattern (which algorithm the weights demand, and what the state actually is) is the part you build yourself.

Pattern Card requirement

Before you solve the canonical problem, write your Pattern Card at .tutor/pattern-cards/advanced-graphs.md. It must contain:

The recognition cue, in your own words (the routing decision on the weights).
The algorithmic skeleton in pseudocode for Dijkstra, for Bellman-Ford, and for Union-Find (find with path compression and union by rank or size), plus a one-line note on when each applies.
Time and space complexity, general form, for Dijkstra (O(E log V)), Bellman-Ford (O(V * E)), Prim’s MST, and Union-Find (near-O(1) amortized per find/union with path compression and union by rank).
Three edge cases (an unreachable destination, a single node, and a constraint of zero stops are good starts).
One war story (a bug you actually hit this week, the visited-set-under-a-constraint bug is a strong candidate); fill this in at the end of the week.

The first four sections must be present before you start the canonical deep-dive. The tutor will probe the card; it will not write any of it for you.

Canonical deep-dive

Problem: Cheapest Flights Within K Stops (https://leetcode.com/problems/cheapest-flights-within-k-stops/)

Cheapest Flights is this course’s home for state expansion: the lesson that “where am I” is not always just “which node”, and that a constraint on the path has to live inside the state. The full guided walk-through, with the state definition, the Bellman-Ford round structure (and why the round index is the stop dimension), the exact complexity, and the contrast with Dijkstra-over-the-expanded-state, is in canonical/cheapest-flights-within-k-stops.md. Work it there. You will implement it yourself, in your own work repo, and write a short debrief. The walk-through is deliberately not an answer key: it shows you how to think, and stops short of code you could paste.

Solve it the Bellman-Ford-by-rounds way as a required deliverable, with its own five-question debrief: this is your one from-memory rep of Bellman-Ford this week, and together with the two core practice problems it is the required core of the week (Dijkstra, Bellman-Ford, Union-Find). The Dijkstra-over-expanded-state version of the same problem is optional stretch: do it only if you have time after the required core, and if you do, give it its own separate debrief rather than folding it into the Bellman-Ford one.

Practice set

Five problems, split into a required core and optional stretch. The core is the gate; the stretch problems deepen the week and are the best warm-up for the Week 16 marathon, but they are not required to close the week. Do the core problems first, then the stretch ones in the listed order. Each has its own folder under practice/ with the problem link, a complexity target for you to fill in, and the debrief template. Do not skip ahead.

Together with the required canonical (Cheapest Flights solved the Bellman-Ford-by-rounds way), the two core practice problems give you your one from-memory rep of each of the three required algorithms this week: Dijkstra, Bellman-Ford, and Union-Find. These are the must-know weighted-graph algorithms for screens, so they are the part you own.

Core (required)

Network Delay Time (practice/network-delay-time/). Single-source shortest path with non-negative weights: the cleanest Dijkstra in the set, where the answer is the time the last node receives the signal. This is your Dijkstra rep.
Redundant Connection (practice/redundant-connection/). A tree plus one extra edge: find the edge that closes a cycle. The home of Union-Find in this set, the cleanest “process edges and detect the first cycle” problem. This is your Union-Find rep.

(Your Bellman-Ford rep is the required canonical, Cheapest Flights, done above; the three required from-memory algorithms are Dijkstra, Bellman-Ford, and Union-Find.)

Optional stretch (do if you have time; recommended before the Week 16 marathon)

Min Cost to Connect All Points (practice/min-cost-to-connect-all-points/). A minimum-spanning-tree question on a dense graph (every pair of points has an edge), the natural home for Prim’s (and a problem Kruskal’s can also solve).
Swim in Rising Water (practice/swim-in-rising-water/). A grid where the cost of a path is the maximum cell on it, not the sum; a small twist on the Dijkstra relaxation (and a binary-search-on-the-answer alternative worth seeing).
Reconstruct Itinerary (practice/reconstruct-itinerary/). Use every ticket exactly once in lexical order: an Eulerian path, solved with Hierholzer’s algorithm. This is the rare-but-worth-recognizing problem of the set: learn to recognize the use-every-edge cue and what Hierholzer’s does, but do not over-drill it, it appears far less often in screens than Dijkstra, Bellman-Ford, or Union-Find.

Each problem follows the five-beat rhythm.

The five-beat rhythm (every problem)

Pattern Card is written (you did this Monday).
Name the pattern aloud and write your approach as a plain-English comment before any code.
Struggle floor: 25 minutes unaided.
Hint ladder if needed: six rungs, one per ask.
Debrief: the five questions, in your commit message, before the next problem.

Reflect

Ten minutes in your notebook, in writing:

Explain it back: describe to a peer who has only done Week 11 how you decide, from a problem statement, between Dijkstra, Bellman-Ford, and state expansion, in three sentences.
Connect: Cheapest Flights can be solved with Bellman-Ford or with Dijkstra over an expanded state. What is the state in each, and why does keying a visited set on the node alone break both?
Monitor: which of the five (Dijkstra, Bellman-Ford, Prim’s MST, Union-Find/Kruskal, Hierholzer) is still fuzzy about when to reach for it? Write the one sentence that would clear it up.

Knowledge Check

Answer from memory first. Items marked ⟲ are spaced callbacks to earlier material.

State the routing decision at the top of the week: given a weighted graph and a “cheapest path” question, what fact about the edges sends you to Dijkstra versus Bellman-Ford?
In Dijkstra, why is a node’s distance final the first time it is popped from the heap, and exactly which assumption about edge weights makes that true?
For a constrained shortest-path problem like “at most k stops”, what is the state, and why does a visited set keyed on the node alone produce a wrong answer?
⟲ From Week 8: you put (cost, node) tuples on a heapq. In what order do they pop, and why can a tie on cost raise an error if the second tuple element is not a plain comparable value?
⟲ From Week 11: how do you build a weighted adjacency list from an edge list with defaultdict, and why is the visited set mandatory in any graph traversal with cycles?

Answer key

1. If every edge weight is non-negative, use Dijkstra (the greedy "settle the closest unsettled node" step is safe). If any edge weight can be negative, Dijkstra is unsafe and you use Bellman-Ford (relax all edges a bounded number of rounds), which tolerates negatives. The deciding fact is the sign of the weights. 2. When Dijkstra pops a node, it is the closest remaining unsettled node, so no other unsettled node has a smaller tentative distance. Because every edge weight is non-negative, any future path to the popped node would have to pass through some other unsettled node first and could only add cost, so it cannot beat the distance already found. That "edges only add cost" guarantee is exactly the non-negative-weight assumption; remove it and the finalize-on-pop step is invalid. 3. The state is `(node, stops_used)` (more generally `(node, constraint)`). A `visited` set on the node alone marks the node done at its first, cheapest arrival, which may have spent too many stops; that blocks a later arrival that costs more but uses fewer stops and is the one the budget actually permits. You must let the constraint dimension distinguish states, so redundancy is judged on the full `(node, stops_used)` pair, not the node. 4. They pop smallest-`cost` first, because tuples compare lexicographically and `heapq` is a min-heap. If two entries tie on `cost`, Python compares the next element to break the tie; if that element is something non-comparable (two custom objects with no ordering), the comparison raises `TypeError`. Keeping the second element a plain integer node id (or inserting a counter as a tiebreaker) avoids it. 5. Write `g = defaultdict(list)` and, for each weighted edge, `g[u].append((v, w))` (also `g[v].append((u, w))` if undirected); the default factory creates an empty list for a new key, so no guard is needed. The `visited` set is mandatory because a graph can contain cycles, and without recording which nodes are already settled or queued, the traversal revisits them endlessly and never terminates.

Cold revisit

This Friday, before any new work, re-solve two prior problems cold: no notes, no prior code open, no AI, the standard 25-minute floor on each.

Word Search II (Week 10, Tries and Intervals). A blind re-solve. Do not look at which pattern it is until you have named it yourself; recognizing it cold is the point.
Minimum Window Substring (Week 3, Sliding Window). A blind re-solve, same rules.

Neither is an advanced-graph problem, and that is deliberate: the cold revisit trains recognition across the whole course, not just this week’s pattern. Debrief each at the end and log the outcome in .tutor/revisit-log.md.

How to know you are done with this week

Pattern Card complete at .tutor/pattern-cards/advanced-graphs.md, including the war-story field.
Canonical deep-dive done: Cheapest Flights implemented yourself the Bellman-Ford-by-rounds way (required) with its own five-question debrief in your work repo. The Dijkstra-over-expanded-state version is optional stretch and not required to close the week; if you did it, it has its own separate debrief.
The two core practice problems solved (Network Delay Time, Redundant Connection), each with a five-question debrief in its commit message. Together with the required Bellman-Ford canonical, this is your from-memory rep of all three required algorithms (Dijkstra, Bellman-Ford, Union-Find). The three optional stretch problems (Min Cost to Connect All Points, Swim in Rising Water, Reconstruct Itinerary) are recommended before the Week 16 marathon but are not required to close the week.
.tutor/revisit-log.md updated by the tutor with this week’s problems and the two cold revisits, with dates.
.tutor/progress.md updated to “Week 12 complete; ready for Week 13.”

If any of the above is missing, the week is not done. The tutor will not advance you to Week 13 until all five are present.

Resources

Free, no account required:

[Article] Dijkstra’s algorithm: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
[Article] Bellman-Ford algorithm: https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
[Article] Prim’s algorithm (minimum spanning tree): https://en.wikipedia.org/wiki/Prim%27s_algorithm
[Article] Eulerian path and Hierholzer’s algorithm: https://en.wikipedia.org/wiki/Eulerian_path
[Docs] Python heapq: https://docs.python.org/3/library/heapq.html