Canonical Deep-Dive: Minimum Window Substring
Problem: Minimum Window Substring (https://leetcode.com/problems/minimum-window-substring/) The lesson: the have-vs-need Counter machinery, expand to satisfy the need, then shrink to the tightest valid window.
This is a guided walk-through, not an answer key. It explains how to think about the window state, the expand-then-shrink rhythm, and the exact complexity in words. It stops short of an assembled, paste-runnable function on purpose. You write the actual Python yourself, in your own work repo. The tutor will not confirm your code; LeetCode’s judge will.
Restated: you are given two strings s and t. Return the shortest substring of s that contains every character of t, counting multiplicity (if t has two 'a's, the window must hold at least two 'a's). If no such window exists, return the empty string "". The answer is guaranteed unique when it exists.
Why brute force is too slow
The direct idea: try every substring of s, and for each one check whether it contains all of t. There are about n^2 substrings (n choices of start times n choices of end), and checking each against t is more work on top, so this is at best O(n^2) and easily worse. The substrings overlap almost completely as the endpoints inch along, so you are re-counting the same characters over and over. That repetition, heavily overlapping contiguous runs, is the recognition cue for the whole week: stop restarting the run and slide it instead.
The window state: have vs need
The whole problem turns on one piece of state. You want to know, at every moment, whether the current window s[left..right] already contains all of t. Re-scanning the window each step would put you back at O(n^2), so instead you maintain that knowledge incrementally.
Hold two things:
- A
Counterof what youneed: initialized toCounter(t), soneed['A']is how many'A's the window still must acquire. This count can go negative, and a negative value is meaningful: it means the window currently holds more copies of that character thantrequires (surplus you are free to shrink away). - A single integer, call it
missing, the number of character slots still unfilled. Initialize it tolen(t). Whenmissinghits0, the window is valid: it contains all oft.
The reason missing is one integer and not “scan the whole Counter” is the performance trick. You never ask “is every requirement met?” by looping over the Counter; you keep a running count of unmet slots and check it in O(1).
The expand-then-shrink rhythm
Walk right across s one character at a time. At each right:
Expand (take the new character in). Look at c = s[right]. Before you decrement, if need[c] > 0 then this character was genuinely needed (not surplus), so one fewer slot is unmet: do missing -= 1. Then always do need[c] -= 1 to record that the window now holds one more c. The order matters: you test need[c] > 0 before the decrement, because that is what distinguishes “this c filled a real requirement” from “this c is surplus beyond what t needs”.
Shrink (tighten from the left while still valid). while missing == 0, the window is valid, so two things happen. First, record it if it is the shortest valid window seen so far (compare right - left + 1 against your best length, and remember left and the length when it improves). Second, try to drop the leftmost character to make the window smaller: let d = s[left], do need[d] += 1 (the window will no longer hold this d), and if that pushes need[d] above 0, then dropping it has re-created an unmet slot, so missing += 1 (which ends the while). Advance left either way. You keep shrinking as long as the window stays valid, which finds the tightest window ending at this right.
In skeleton form (you write the real Python, this is the shape only):
need = Counter(t)
missing = len(t)
left = 0
best_len = infinity # length of the best window so far
best_left = 0 # where that best window starts
for right, c in enumerate(s):
if need[c] > 0:
missing -= 1
need[c] -= 1
while missing == 0: # window [left..right] is valid
if (right - left + 1) is smaller than best_len:
record best_len and best_left from this window
d = s[left]
need[d] += 1
if need[d] > 0: # dropping d broke validity
missing += 1
left += 1
return "" if best_len is infinity else the slice of s of length best_len at best_left
This is the have-vs-need template. Notice there is no inner re-scan anywhere: every step is O(1) Counter and integer work.
Trace it on the provided example
Take s = "ADOBECODEBANC", t = "ABC". So need = {A:1, B:1, C:1} and missing = 3.
rightwalks forward throughA,D,O,B, … . Each needed character (A, thenB) dropsmissing; the others (D,O) just go negative or to-1inneedand leavemissingalone.- The first time the window becomes valid is at
s[0..5] = "ADOBEC", whenCarrives andmissingreaches0. That is a valid window of length 6. Now thewhileshrinks:s[0]isA, which is needed, so dropping it would break validity; you record length 6, then advanceleftpastA,missingbecomes1, and thewhilestops. rightcontinues. The window re-validates later (around"CODEBA"and then"CODEBANC"shapes), and each time thewhileshrinks it as far left as it can. The shortest valid window the sweep ever records iss[9..12] = "BANC", length 4, which containsA,B, andC.- No shorter valid window exists, so the answer is
"BANC".
Hand-simulate two or three more right steps yourself, tracking need and missing as you go; doing it by hand once is what makes the two-integer trick click.
The key insight, in one sentence
Maintain “how many required character slots are still unmet” as a single running integer; expand the right end until that integer reaches zero, then shrink the left end as far as it can go while still zero, recording the tightest window each time, so every character of s enters the window exactly once and leaves at most once.
Edge cases to handle yourself
tlonger thans, orsshorter than any valid window: no window is ever valid, return"". The “best length is still infinity” check covers this.- Repeated characters in
t, for examplet = "AABC": the multiplicity matters, which is whyneedis a count, not a set. The window must hold two'A's before that requirement is met. tis one character, ors == t: the window machinery still works; confirm it on a tiny case.
Exact complexity
-
Time: O( s + t ). Building needfromtis O(t ). Then rightadvances acrosssonce (O(s )), and leftadvances acrosssat most once in total over the whole run (each character leaves the window at most once), so the two-index sweep is O(s ), not O( s ^2). There is no inner re-scan and no hidden sort. -
Space: O( s + t ). The needCounter holds at most one entry per distinct character it ever touches, which is bounded by the characters oftplus the characters ofsit decrements. If you prefer to think of it as bounded by the alphabet size, that is a constant; thes + t bound is the safe, general statement to give an interviewer.
Your work this week
- Implement Minimum Window Substring yourself, in your own work repo (for example
minimum-window-substring.py). Build theneedCounter, trackmissing, expandright, shrinkleftwhile valid, and extract the best window with the empty-string fallback. Do not copy a solution; build it from the have-vs-need machinery above. - Run it against the provided examples in
practice/-style local tests, then submit to LeetCode’s judge. Test the edge cases yourself:tlonger thans(answer""),twith a repeated character (the multiplicity must be respected), ands == t. -
Commit with a five-question debrief in the message. In answer 3, state the exact complexity above, O( s + t ) time and O( s + t ) space, and explain why leftadvancing inside awhiledoes not make the sweep quadratic.
When you can explain, from memory and with no AI open, what missing counts, why you test need[c] > 0 before decrementing, and why the total work is O( |
s | + | t | ) despite the inner while, you own this problem. That explanation is exactly what an interviewer will ask for, and the have-vs-need template is one you will reuse on any “window must contain these characters” question. |