Week 8: Heap and Priority Queue

Pattern family: Heap and Priority Queue · Time budget: 18 hours (the midpoint week carries a consolidation block; budget for it) · Prerequisites: Weeks 1 through 7 done, with trees (Week 7) and the linked-list merge (Week 6) fresh. · Cold-revisit obligation: the midpoint consolidation set, about six cold re-solves (Reorder List, fixed, plus five of your choice from Weeks 1 to 6).

Overview

A heap answers one question fast: “what is the best item right now?” It keeps a collection partially ordered so that the smallest element (Python’s heap is a min-heap) is always at the front, and both pushing a new item and popping the best one cost O(log n). It does not sort the whole collection, and that is the point. When a problem only ever needs the next-best item, paying O(n log n) to sort everything is waste; a heap gives you the front element for O(log n) per operation and never orders the rest.

This week is built around three shapes that cover almost every heap problem you will meet. First, top-k: keep a heap of size k and you can answer “the k largest” or “the kth largest” in O(n log k) without sorting the whole input. Second, merging sorted streams: a heap holding one front element from each of several sorted sequences lets you pull the global minimum repeatedly, which is how you merge k sorted lists in one sweep. Third, two heaps for a running median: split the data into a lower half and an upper half, each a heap, and the median is always sitting at one of the two tops. Most of the work this week is training the reflex that fires before any code: when you catch the words “top k”, “kth largest”, “merge k sorted”, “median of a stream”, or “always take the best next thing”, reach for a heap.

flowchart TD
    Q["Read the problem"] --> A{"Do I only ever need<br/>the best item next,<br/>not the whole order?"}
    A -->|no| G["Probably sorting, a hash map,<br/>or a different pattern.<br/>Keep reading the cues."]
    A -->|yes| B{"Which heap shape<br/>does the cue match?"}
    B -->|"'top k' or 'kth largest'"| K["Size-k heap:<br/>O(n log k), O(k) space"]
    B -->|"'merge k sorted'"| M["Heap of one front per stream:<br/>pop the global min, push its successor"]
    B -->|"'median of a stream'"| H["Two heaps:<br/>max-heap (low half) +<br/>min-heap (high half), kept balanced"]

Notice: the decision happens before you code, and it has two beats. First, do you only need the next-best item (heap at all)? Second, which of the three shapes does the cue match? Naming the pattern out loud (principle 1) is what makes this diagram fire under pressure.

Warm-Up: Retrieve Before You Begin

Answer from memory, no peeking. This pulls forward the Week 0 Python refresher that this week leans on hardest.

  1. From the refresher: Python’s heapq gives you which kind of heap, and which element sits at index 0 after you push a few numbers? What do heappush and heappop each cost?
  2. From the refresher: Python has no built-in max-heap. What is the trick to get max-heap behaviour out of heapq, on both the way in and the way out?
  3. From the refresher: if you push tuples like (priority, item) onto a heap and two priorities tie, what does Python compare next, and how does that go wrong? What is the standard fix?
Check your recall 1. `heapq` gives a binary **min-heap**, so the smallest element is always at index 0. `heappush` and `heappop` are each O(log n); `heapify` (build a heap from a list in place) is O(n), and reading the front (`h[0]`) is O(1). 2. Push the **negation** of each value (`heapq.heappush(h, -x)`) so the most-negative (largest original) sits at the front, and negate again on the way out (`-heapq.heappop(h)`). The heap is still a min-heap; you have just flipped the sign so its "smallest" is your "largest". 3. Tuples compare **lexicographically**: if the first elements tie, Python compares the second. If the second element is something that does not support comparison (a `ListNode`, a custom object), you get a `TypeError` the moment two priorities collide. The standard fix is to insert a unique, always-comparable tiebreaker in the middle, like an incrementing counter or an index: `(priority, count, item)`.

Learning objectives

By the end of this week you can:

  • Recognize the heap cues (“top k”, “kth largest”, “merge k sorted”, “median of a stream”, “best next thing”) and reach for a heap without being told.
  • Distinguish the three heap shapes (size-k heap, merge of sorted streams, two heaps for a median) and name which a problem calls for.
  • Construct a size-k heap solution and explain why it is O(n log k) rather than O(n log n).
  • Explain the two-heap median invariant: which half each heap holds, how they stay balanced, and why that makes the median a top (or the average of two tops).
  • Analyze the time and space complexity of each solution and say what dominates at scale, including the heap-versus-quickselect trade for kth-largest.
  • Defend every line of each solution you commit, from memory, with no AI open, including why your tuple heap entries never trip the tie-breaker.

Recognition cue

“I only ever need the best item next, not the whole sorted order” is the root cue for a heap. In a problem statement, the surface phrases are “top k”, “kth largest” or “kth smallest”, “find the median of a stream”, “merge k sorted …”, and “always process the best/closest/cheapest next”. The deeper tell is that you would otherwise be tempted to sort the entire input just to read off one end of it repeatedly; a heap gives you that one end for O(log n) per access and leaves the rest unsorted, which is cheaper whenever k is much smaller than n or the data arrives over time.

Core concepts to internalize this week

  • A heap is partial order, not full order. It guarantees only that the front element is the smallest (min-heap). The rest of the array is not sorted, and you must not assume it is. This is exactly why a heap is cheaper than a sort when you only need one end.
  • The three heap shapes. (1) Top-k / kth largest: maintain a heap of size k. For the k largest, use a min-heap of size k, so the smallest of your current top-k sits at the front and is the one to evict when something bigger arrives. (2) Merge sorted streams: put one front element from each sorted source in the heap; repeatedly pop the global minimum and push the successor from the same source. (3) Two heaps for a median: a max-heap for the lower half and a min-heap for the upper half, sizes kept within one of each other.
  • The size-k heap, and why it is O(n log k). You walk all n items once; for each you do a push and possibly a pop on a heap that never exceeds size k, and each of those is O(log k). That is O(n log k) time and O(k) space, strictly better than sorting all n items at O(n log n) when k is small.
  • The max-heap-of-size-k counterintuition. To keep the k largest items, you hold them in a min-heap so the weakest survivor is cheap to find and drop. To keep the k smallest (k closest to origin), you hold them in a max-heap (negate) so the worst survivor (farthest) is at the front to evict. Pin this down on the Pattern Card; it is the single most common heap mistake.
  • Two heaps for the median, the invariant. Keep low (a max-heap, holding the smaller half) and high (a min-heap, holding the larger half) such that every element of low is less than or equal to every element of high, and their sizes differ by at most one. Then the median is the top of the larger heap, or the average of the two tops when the sizes are equal. Adding a number is O(log n) (a push and a rebalancing pop or two); reading the median is O(1) (peek one or two tops).
  • Heap versus quickselect for kth-largest. A size-k heap is O(n log k) time, O(k) space, and stable to reason about. Quickselect (partition around a pivot, recurse into the side that contains the kth position) is O(n) average time and O(1) extra space, but O(n^2) worst case on bad pivots. Know both and know the trade: the heap is the safe default and the one that generalises to streams; quickselect is the answer when an interviewer pushes for better-than-O(n log k) and you can defend the average case.
  • heappushpop and heapreplace as one-step combos. Pushing then popping in two calls touches the heap twice; heapq.heappushpop(h, x) pushes x and pops the smallest in one O(log n) operation, which is the natural primitive for the size-k pattern (push the new item, pop the front, keep size k).

Common misconception. “Python has a max-heap; I just call some max version of the functions.” Reality. Python’s heapq is a min-heap only. There is no max-heap in the standard library. For max behaviour you negate values on the way in and on the way out (for numbers), or you store (-key, ...) tuples (when the payload is not a bare number). Forgetting the second negation is a classic bug: your code runs, the values are just silently wrong. State the negation trick out loud whenever you reach for a max-heap.

Common misconception. “A heap of tuples just works; I do not have to think about ties.” Reality. Tuples compare lexicographically, so a heap of (distance, point) or (value, node) compares the second element whenever the first ties. If that second element is a plain list, a ListNode, or any object without a defined ordering, the comparison raises TypeError exactly when two priorities collide, which may not show up on small test inputs. The fix is a unique, always-comparable middle field: (priority, counter, item) or (priority, index, item). Make the tiebreaker do the comparing so the payload never has to.

Python idioms this week

You are not yet expected to be fluent, so here is the toolbox this week uses. If any is unfamiliar, open the Python refresher before Monday (sections on heapq, tuples as composite keys, the negation trick).

  • heapq.heappush(h, x) and heapq.heappop(h), each O(log n); h[0] to peek the front in O(1).
  • heapq.heapify(lst) to turn a list into a heap in place in O(n), when you already have all the items.
  • The negation trick for a max-heap: push -x, pop and negate, or store (-key, payload) tuples.
  • Tuples as heap entries, ordered lexicographically: (priority, item), and (priority, tiebreaker, item) when the payload is not comparable.
  • heapq.heappushpop(h, x) (push then pop the smallest, one O(log n) call) and heapq.heapreplace(h, x) (pop then push); both are the natural tools for keeping a heap at a fixed size.
  • collections.Counter(iterable) to tally frequencies before heaping them (Task Scheduler builds counts first).

These are tools you are allowed to know cold in an interview. The pattern (which of the three heap shapes the problem is, and why) is the part you build yourself.

Pattern Card requirement

Before you solve the canonical problem, write your Pattern Card at .tutor/pattern-cards/heap-priority-queue.md. It must contain:

  • The recognition cue, in your own words (the root cue plus the three surface shapes).
  • The algorithmic skeleton in pseudocode for each of the three shapes (size-k heap, merge of sorted streams, two-heap median).
  • Time and space complexity, general form (O(n log k) for size-k, O(N log k) for merging k streams of total length N, O(log n) add and O(1) query for the two-heap median).
  • Three edge cases (empty input, k equal to the length of the input, and all-equal values are good starts; for the median, an even versus odd count).
  • One war story (a bug you actually hit this week, very likely the missing second negation or a tuple tie-breaker crash); fill this in at the end of the week.

The first four sections must be present before you start the canonical deep-dive. The tutor will probe the card; it will not write any of it for you.

Canonical deep-dive

Problem: Find Median from Data Stream (https://leetcode.com/problems/find-median-from-data-stream/)

Find Median from Data Stream is this course’s home for the two-heap balancing trick, the most elegant of the three heap shapes. The full guided walk-through, with the invariant, the balancing rule, and the exact complexity, is in canonical/find-median-from-data-stream.md. Work it there. Because this is a design problem you build incrementally, the folder ships a provided-example test (canonical/tests/test_provided.py) that drives your MedianFinder through an operation sequence so you can check yourself locally; the walk-through itself stops short of a paste-runnable class on purpose. You write the actual Python yourself, in your own work repo, and write a short debrief.

Practice set

Four problems, in this order. Each has its own folder under practice/ with the problem link, a complexity target for you to fill in, and the debrief template. Do not skip ahead.

  1. Kth Largest Element in an Array (practice/kth-largest-element-in-an-array/). The size-k heap in its purest form. Decide whether you keep a min-heap or a max-heap of size k, and be ready to discuss quickselect as the alternative.
  2. K Closest Points to Origin (practice/k-closest-points-to-origin/). The size-k heap again, but now the “key” is a computed distance and the payload is a point, so this is where tuples-as-entries earns its keep.
  3. Task Scheduler (practice/task-scheduler/). “Always do the most frequent remaining task next” is a greedy-with-a-heap shape; count first, then repeatedly serve the highest-count task. There is also a clean formula once you see the structure.
  4. Merge K Sorted Lists (practice/merge-k-sorted-lists/). You solved this in Week 6 by repeated pairwise merging; this week you redo it with a heap of one front node per list. Same problem, the heap angle. This is the bridge that ties the linked-list and heap patterns together.

Each problem follows the five-beat rhythm.

The five-beat rhythm (every problem)

  1. Pattern Card is written (you did this Monday).
  2. Name the pattern aloud and write your approach as a plain-English comment before any code.
  3. Struggle floor: 25 minutes unaided.
  4. Hint ladder if needed: six rungs, one per ask.
  5. Debrief: the five questions, in your commit message, before the next problem.

Consolidation block (Friday and weekend)

Week 8 is the midpoint of the course. Half the patterns are now behind you (Arrays and Hashing, Two Pointers, Sliding Window, Stacks, Binary Search, Linked List, Trees, and this week’s Heap), and the second half (Backtracking, Tries and Intervals, Graphs, Advanced Graphs, both Dynamic Programming weeks, the Greedy/Bit/Math grab-bag, and the Mock Marathon) leans on all of them. Before you push forward, you stop and consolidate. This is a first-class part of the week, not optional review; it has its own line in “How to know you are done.”

There are two deliverables.

1. Re-solve about six problems cold from Weeks 1 to 6. This set is the week’s cold revisit (the revisit.md file points back here): re-solve Reorder List (Week 6), which is fixed because it fuses three primitives, plus five more of your choice drawn from anywhere in Weeks 1 through 6. Re-solve each cold: blank file, no notes, no prior code open, no AI, the standard 25-minute struggle floor on each. Choose deliberately, not for comfort. Good selection heuristics:

  • Pick at least one problem from a pattern that felt shaky the first time.
  • Pick across patterns, not five from one week, so you are exercising recognition and not just repeating one motion.
  • Favour the problems whose pattern you could not name instantly if you saw them cold today.

Name the pattern aloud before coding each one, submit each to LeetCode’s judge (the judge is the oracle, not the tutor), and log all six in .tutor/revisit-log.md with the date. This is one set, not two: at the midpoint it folds the weekly cold revisit and the consolidation together, so spread the six across Friday and the weekend rather than treating them as nine separate re-solves.

2. Write a one-page midpoint reflection. One page, in writing, in your work repo (for example midpoint-reflection.md). It must answer, concretely:

  • Which patterns are not sticking? Name them. For each, write the one sentence that would unstick it (the cue you keep missing, or the step you keep fumbling). “Sliding window is fuzzy” is not enough; “I never know when to shrink the window versus when to grow it” is.
  • Where is your recognition slow? Of the cold re-solves above, which problems took longest to recognise (as opposed to longest to code)? Recognition speed is the thing the whole course is training; name where it is lagging.
  • What changes for Weeks 9 to 16? Given the above, what will you do differently in the second half? Concrete commitments only: which patterns get an extra stretch session, whether you adjust the struggle floor, which earlier weeks you will fold into future Friday revisits.

The tutor will read the reflection and may probe it, but it will not write any of it for you. The point is an honest, specific audit of where you are at the halfway mark, so the second half is aimed rather than blind.

Reflect

Ten minutes in your notebook, in writing:

  • Explain it back: describe the two-heap median trick to a peer who has only done arrays and trees so far, in four sentences (what each heap holds, the invariant between them, how you read the median, what adding a number costs).
  • Connect: the size-k heap for “kth largest” and the bucket-sort approach you used for Top K Frequent in Week 1 both avoid a full sort. What is each one’s complexity, and when would you reach for one over the other?
  • Monitor: which of the three heap shapes is still the one you would not reach for unprompted? Write the one sentence (the cue) that would make it fire.

Knowledge Check

Answer from memory first. Items marked ⟲ are spaced callbacks to earlier material.

  1. Name the three heap shapes from this week and the surface cue that should trigger each.
  2. To keep the k largest elements, do you maintain a min-heap or a max-heap of size k, and why is that the cheap choice?
  3. State the two-heap median invariant: what does each heap hold, how do their sizes relate, and where is the median?
  4. ⟲ From Week 0: Python has only a min-heap. How do you get max-heap behaviour, and what is the bug if you forget the second half of the trick?
  5. ⟲ From Week 0: you push (value, node) tuples onto a heap and two values tie. What happens, and what is the standard fix?
Answer key 1. (a) **Top-k / kth largest** -> "top k", "kth largest/smallest": a size-k heap. (b) **Merge sorted streams** -> "merge k sorted ...": a heap holding one front element per source. (c) **Two-heap median** -> "median of a stream / running median": a max-heap for the low half and a min-heap for the high half. 2. A **min-heap** of size k. The front is then the smallest of your current top-k, which is exactly the element to evict when a larger one arrives, so the keep-or-drop decision is O(log k) per item and the total is O(n log k). 3. `low` is a max-heap holding the smaller half; `high` is a min-heap holding the larger half; every element of `low` is less than or equal to every element of `high`; their sizes differ by at most one. The median is the top of the larger heap, or the average of the two tops when the sizes are equal. 4. Push negated values (`-x`) and negate again on the way out, or store `(-key, payload)` tuples. If you forget the second negation (or forget to negate the key you read back), the code still runs but returns sign-flipped, silently wrong values. 5. Python compares the tuples lexicographically and falls through to comparing the second element, `node`; if `node` has no defined ordering you get a `TypeError`, and only when two values happen to tie. The fix is a unique, always-comparable middle field, such as an incrementing counter or a list index: `(value, counter, node)`.

Cold revisit

At the midpoint, the weekly cold revisit and the consolidation block are the same set: the roughly six cold re-solves described in the consolidation block above (Reorder List, which is fixed, plus five of your choice from Weeks 1 to 6), also listed in revisit.md. Spread them across Friday and the weekend, and log everything in .tutor/revisit-log.md.

How to know you are done with this week

  • Pattern Card complete at .tutor/pattern-cards/heap-priority-queue.md, including the war-story field.
  • Canonical deep-dive done: MedianFinder implemented yourself (two heaps, balanced), passing the provided operation-sequence test, with your short debrief (in your work repo).
  • Four practice problems solved, each with a five-question debrief in its commit message.
  • Midpoint consolidation done: about six cold re-solves (Reorder List, fixed, plus five of your choice from Weeks 1 to 6) logged with dates, and the one-page midpoint reflection written in your work repo.
  • .tutor/revisit-log.md updated by the tutor with this week’s four practice problems and the six consolidation re-solves, with dates.
  • .tutor/progress.md updated to “Week 8 complete; ready for Week 9.”

If any of the above is missing, the week is not done. The tutor will not advance you to Week 9 until all of them are present. The consolidation block in particular is the gate: skipping the midpoint audit is how the second half of the course goes blind.

Resources

Free, no account required:

  • [Article] Binary heap, the idea: https://en.wikipedia.org/wiki/Binary_heap
  • [Docs] Python heapq: https://docs.python.org/3/library/heapq.html
  • [Article] Quickselect (the kth-largest alternative): https://en.wikipedia.org/wiki/Quickselect
  • [Docs] Python collections.Counter: https://docs.python.org/3/library/collections.html#collections.Counter

Table of contents