Week 1: Arrays and Hashing

Pattern family: Foundational · Time budget: 12 hours · Prerequisites: Week 0 done (Python refresher and self-check passed). · Cold-revisit obligation: None (first week).

Overview

This is the first pattern, and it is the one you will reach for more than any other. The whole idea is a trade: spend memory to buy time. A hash map (Python’s dict) and a hash set (set) give you average O(1) membership checks and lookups, which means a question that looks like it needs a nested loop (“for each element, is its partner somewhere in the array?”) often collapses to a single pass.

Most of the work this week is training the reflex that fires before you write any code: when you catch yourself about to scan the array again to answer “have I seen this?”, stop and reach for a hash structure instead.

flowchart TD
    Q["Read the problem"] --> A{"Am I checking membership,<br/>counting frequency, or asking<br/>'have I seen this before'?"}
    A -->|yes| H["Reach for a dict or set:<br/>O(1) average lookup, O(n) space"]
    A -->|no| B{"Is a nested loop doing<br/>membership-style work?"}
    B -->|yes| H
    B -->|no| C["Probably a different pattern.<br/>Keep reading the cues."]

Notice: the decision happens before you code. Naming the pattern out loud (principle 1) is what makes this diagram fire under pressure.

Warm-Up: Retrieve Before You Begin

Answer from memory, no peeking. This pulls forward the Week 0 Python refresher that this week leans on.

What is the difference between a dict and a set in Python, and what is the average time cost of checking membership in each?
What does collections.Counter("aabbbc") give you, and how would you read off the count of "b"?
How does d.get("x", 0) differ from d["x"] when "x" is not a key?

Check your recall

1. A `dict` maps keys to values; a `set` stores only keys (unique elements). Both give average O(1) membership testing (`x in d`, `x in s`), because both are hash-based. The difference is whether you need an associated value. 2. `Counter("aabbbc")` returns `Counter({'b': 3, 'a': 2, 'c': 1})`, a `dict` subclass mapping each item to its count. Read a count with `c["b"]` (which gives `3`, and gives `0` rather than raising for a missing key). 3. `d.get("x", 0)` returns `0` (the default) when `"x"` is absent; `d["x"]` raises `KeyError`. The `get`-with-default idiom is how you count and accumulate without a guard clause.

Learning objectives

By the end of this week you can:

Recognize the membership / frequency / “seen it before” cue and reach for a hash structure without being told.
Distinguish when a dict, a set, and a Counter each fit, and justify the choice.
Explain the space-for-time trade in concrete terms (what you pay, what you save).
Construct a one-pass hash-map solution where a naive approach would be two nested loops.
Analyze the time and space complexity of your solution and name what dominates at scale.
Defend every line of each solution you commit, from memory, with no AI open.

Recognition cue

“I am checking membership, counting frequency, or asking ‘have I seen this before’” almost always means a hash map or hash set is the right reach. The cue is also any time a nested loop is doing membership-style work. If you find yourself writing for x in arr: for y in arr: and the inner loop is just looking for something, stop and ask whether a hash map could replace one of those loops.

Core concepts to internalize this week

Hash map vs hash set vs Counter. Use a set when you only care whether something is present; a dict when you need an associated value (like an index); a Counter when you are tallying frequencies.
“Trade space for time,” made concrete. A hash map costs O(n) extra memory to turn an O(n^2) scan into an O(n) pass. Name that trade out loud every time you make it; interviewers ask you to.
One-pass vs two-pass. Two Sum can be done by first building a full map, then scanning (two passes), or by checking-then-inserting as you go (one pass). Knowing why the one-pass version is correct (you only need partners that came before you) is the lesson.
Keys for grouping. To group anagrams, the key is something all anagrams share: the sorted string, or a 26-length count tuple. Choosing the key is the whole problem.
Bucket sort by frequency. For top-k, you do not need to sort all counts; you can bucket items by frequency into an array indexed by count, which is O(n).
The prefix-sum / running-accumulation family. A precomputed prefix array (and, when “everything except me” is the question, a matching suffix array) answers range and product queries in O(n) instead of recomputing per index, as in Product of Array Except Self. The hash-powered version is the one to remember: a running prefix sum paired with a hash map of prefix sums seen so far answers “how many subarrays sum to k” in one pass, and it stays correct even when the array contains negatives, where a sliding window would not.

Common misconception. “A dict lookup is free, so hashing makes everything O(1).” Reality. A single average lookup is O(1), but building a dict over n items is O(n) time and O(n) space, and worst-case lookups degrade with bad hash distributions. The win is turning repeated O(n) scans into O(1) checks; the cost is the memory and the one-time build. State both halves when you justify the approach.

Common misconception. “Sorting first is always slower than hashing.” Reality. Sorting is O(n log n) and uses O(1) extra space (or O(n) to preserve indices); hashing is O(n) time and O(n) space. When memory is tight or the input is already sorted, the “slower” sort can be the better engineering choice. Interviewers reward knowing the trade, not reciting “hashing wins.”

Python idioms this week

You are not yet expected to be fluent, so here is the toolbox this week uses. If any is unfamiliar, open the Python refresher before Monday.

set() and x in s for O(1) membership.
dict, d[key] = value, and d.get(key, default) for counting without a guard.
collections.Counter(iterable) for frequencies; collections.defaultdict(list) for grouping.
A tuple as a dict key (tuples are hashable; lists are not).
List and dict comprehensions for compact transforms.
sorted(iterable, key=...) when you need an ordering.

These are tools you are allowed to know cold in an interview. The pattern (which structure solves the problem) is the part you build yourself.

Pattern Card requirement

Before you solve the canonical problem, write your Pattern Card at .tutor/pattern-cards/arrays-and-hashing.md. It must contain:

The recognition cue, in your own words.
The algorithmic skeleton in pseudocode.
Time and space complexity, general form.
Three edge cases (empty input, single element, all-duplicates are good starts).
One war story (a bug you actually hit this week); fill this in at the end of the week.

The first four sections must be present before you start the canonical deep-dive. The tutor will probe the card; it will not write any of it for you.

Canonical deep-dive

Problem: Two Sum (https://leetcode.com/problems/two-sum/)

Two Sum is the course’s first lesson in “every solution has a price.” The full guided walk-through, with the three approaches and their costs, is in canonical/two-sum.md. Work it there. You will implement all three approaches yourself, in your own work repo, time them on a large input, and write a short reflection. Two Sum is unusual in that all three solutions are worth writing; later weeks will not always replicate this depth.

Practice set

Six problems, in this order. Each has its own folder under practice/ with the problem link, a complexity target for you to fill in, and the debrief template. Do not skip ahead.

Valid Anagram (practice/valid-anagram/). Compare two strings by character-frequency maps instead of sorting.
Contains Duplicate (practice/contains-duplicate/). One-pass set membership.
Group Anagrams (practice/group-anagrams/). Bucket strings by a shared key.
Top K Frequent Elements (practice/top-k-frequent-elements/). Bucket sort by frequency, or a size-k heap (which Week 8 will make routine).
Product of Array Except Self (practice/product-of-array-except-self/). A prefix pass and a suffix pass, no division; your first taste of the prefix-accumulation family.
Longest Consecutive Sequence (practice/longest-consecutive-sequence/). A hash set turns “is this value’s neighbor here?” into O(1), beating the obvious sort.

Each problem follows the five-beat rhythm.

The five-beat rhythm (every problem)

Pattern Card is written (you did this Monday).
Name the pattern aloud and write your approach as a plain-English comment before any code.
Struggle floor: 25 minutes unaided.
Hint ladder if needed: six rungs, one per ask.
Debrief: the five questions, in your commit message, before the next problem.

Say it out loud

The debrief above is written and after the fact; interviews grade you on talking in real time. So each week, narrate the canonical problem aloud (to a rubber duck, a study partner, or just yourself): restate the problem in your own words, state the brute force and its cost, name the pattern and your target complexity, then walk a small example by hand. Start light now; Week 16 formalizes this into timed mock-interview practice, and you want the delivery to feel rehearsed by then, not new.

Reflect

Ten minutes in your notebook, in writing:

Explain it back: describe the space-for-time trade to a peer who just finished Week 0, in three sentences.
Connect: in Two Sum’s one-pass version, why is it correct to check the map before inserting the current number?
Monitor: which of dict / set / Counter is still fuzzy about when to use it? Write the one sentence that would clear it up.

Knowledge Check

Answer from memory first. Items marked ⟲ are spaced callbacks to earlier material.

What input shape or phrase in a problem statement should make you reach for a hash structure?
In Two Sum’s one-pass solution, what is the key and what is the value in the map?
Why does grouping anagrams need a shared key, and what are two keys that work?
⟲ From Week 0: why can a tuple be a dict key but a list cannot?
⟲ From Week 0: what does d.get(k, 0) return when k is absent, and why does that make counting cleaner?

Answer key

1. "Membership," "have I seen this," "count how many," or a nested loop whose inner job is just to find something. 2. Key: the number you have seen; value: its index. You look up `target - current` to find a partner's index in O(1). 3. Anagrams differ in order but share their multiset of characters; the key must collapse order. Two that work: the sorted string (`"".join(sorted(s))`) and a 26-length count tuple. 4. A `dict` key must be hashable, which requires immutability; tuples are immutable and hashable, lists are mutable and unhashable. 5. It returns `0`, the default, so you can write `counts[c] = counts.get(c, 0) + 1` without first checking whether `c` is present.

Cold revisit

None this week. Friday is spent perfecting your workflow and your Pattern Card, and making sure your commit-message debriefs are complete; the cold-revisit machinery starts in Week 2.

How to know you are done with this week

Pattern Card complete at .tutor/pattern-cards/arrays-and-hashing.md.
Canonical deep-dive done: three Two Sum implementations written and timed, plus your short reflection (in your work repo).
Six practice problems solved, each with a five-question debrief in its commit message.
.tutor/revisit-log.md updated by the tutor with all six problems and dates.
.tutor/progress.md updated to “Week 1 complete; ready for Week 2.”

If any of the above is missing, the week is not done. The tutor will not advance you to Week 2 until all five are present.

Resources

Free, no account required:

[Article] Hash tables, the idea: https://en.wikipedia.org/wiki/Hash_table
[Docs] Python collections (Counter, defaultdict): https://docs.python.org/3/library/collections.html
[Docs] Python dict and set: https://docs.python.org/3/tutorial/datastructures.html