Canonical Deep-Dive: Minimum Window Substring

Problem: Minimum Window Substring (https://leetcode.com/problems/minimum-window-substring/) The lesson: the have-vs-need Counter machinery, expand to satisfy the need, then shrink to the tightest valid window.

This is a guided walk-through, not an answer key. It explains how to think about the window state, the expand-then-shrink rhythm, and the exact complexity in words. It stops short of an assembled, paste-runnable function on purpose. You write the actual Python yourself, in your own work repo. The tutor will not confirm your code; LeetCode’s judge will.

Restated: you are given two strings s and t. Return the shortest substring of s that contains every character of t, counting multiplicity (if t has two 'a's, the window must hold at least two 'a's). If no such window exists, return the empty string "". The answer is guaranteed unique when it exists.

Why brute force is too slow

The direct idea: try every substring of s, and for each one check whether it contains all of t. There are about n^2 substrings (n choices of start times n choices of end), and checking each against t is more work on top, so this is at best O(n^2) and easily worse. The substrings overlap almost completely as the endpoints inch along, so you are re-counting the same characters over and over. That repetition, heavily overlapping contiguous runs, is the recognition cue for the whole week: stop restarting the run and slide it instead.

The window state: have vs need

The whole problem turns on one piece of state. You want to know, at every moment, whether the current window s[left..right] already contains all of t. Re-scanning the window each step would put you back at O(n^2), so instead you maintain that knowledge incrementally.

Hold two things:

  • A Counter of what you need: initialized to Counter(t), so need['A'] is how many 'A's the window still must acquire. This count can go negative, and a negative value is meaningful: it means the window currently holds more copies of that character than t requires (surplus you are free to shrink away).
  • A single integer, call it missing, the number of character slots still unfilled. Initialize it to len(t). When missing hits 0, the window is valid: it contains all of t.

The reason missing is one integer and not “scan the whole Counter” is the performance trick. You never ask “is every requirement met?” by looping over the Counter; you keep a running count of unmet slots and check it in O(1).

The expand-then-shrink rhythm

Walk right across s one character at a time. At each right:

Expand (take the new character in). Look at c = s[right]. Before you decrement, if need[c] > 0 then this character was genuinely needed (not surplus), so one fewer slot is unmet: do missing -= 1. Then always do need[c] -= 1 to record that the window now holds one more c. The order matters: you test need[c] > 0 before the decrement, because that is what distinguishes “this c filled a real requirement” from “this c is surplus beyond what t needs”.

Shrink (tighten from the left while still valid). while missing == 0, the window is valid, so two things happen. First, record it if it is the shortest valid window seen so far (compare right - left + 1 against your best length, and remember left and the length when it improves). Second, try to drop the leftmost character to make the window smaller: let d = s[left], do need[d] += 1 (the window will no longer hold this d), and if that pushes need[d] above 0, then dropping it has re-created an unmet slot, so missing += 1 (which ends the while). Advance left either way. You keep shrinking as long as the window stays valid, which finds the tightest window ending at this right.

In skeleton form (you write the real Python, this is the shape only):

need = Counter(t)
missing = len(t)
left = 0
best_len = infinity            # length of the best window so far
best_left = 0                  # where that best window starts

for right, c in enumerate(s):
    if need[c] > 0:
        missing -= 1
    need[c] -= 1

    while missing == 0:        # window [left..right] is valid
        if (right - left + 1) is smaller than best_len:
            record best_len and best_left from this window
        d = s[left]
        need[d] += 1
        if need[d] > 0:        # dropping d broke validity
            missing += 1
        left += 1

return "" if best_len is infinity else the slice of s of length best_len at best_left

This is the have-vs-need template. Notice there is no inner re-scan anywhere: every step is O(1) Counter and integer work.

Trace it on the provided example

Take s = "ADOBECODEBANC", t = "ABC". So need = {A:1, B:1, C:1} and missing = 3.

  • right walks forward through A, D, O, B, … . Each needed character (A, then B) drops missing; the others (D, O) just go negative or to -1 in need and leave missing alone.
  • The first time the window becomes valid is at s[0..5] = "ADOBEC", when C arrives and missing reaches 0. That is a valid window of length 6. Now the while shrinks: s[0] is A, which is needed, so dropping it would break validity; you record length 6, then advance left past A, missing becomes 1, and the while stops.
  • right continues. The window re-validates later (around "CODEBA" and then "CODEBANC" shapes), and each time the while shrinks it as far left as it can. The shortest valid window the sweep ever records is s[9..12] = "BANC", length 4, which contains A, B, and C.
  • No shorter valid window exists, so the answer is "BANC".

Hand-simulate two or three more right steps yourself, tracking need and missing as you go; doing it by hand once is what makes the two-integer trick click.

The key insight, in one sentence

Maintain “how many required character slots are still unmet” as a single running integer; expand the right end until that integer reaches zero, then shrink the left end as far as it can go while still zero, recording the tightest window each time, so every character of s enters the window exactly once and leaves at most once.

Edge cases to handle yourself

  • t longer than s, or s shorter than any valid window: no window is ever valid, return "". The “best length is still infinity” check covers this.
  • Repeated characters in t, for example t = "AABC": the multiplicity matters, which is why need is a count, not a set. The window must hold two 'A's before that requirement is met.
  • t is one character, or s == t: the window machinery still works; confirm it on a tiny case.

Exact complexity

  • Time: O( s + t ). Building need from t is O( t ). Then right advances across s once (O( s )), and left advances across s at most once in total over the whole run (each character leaves the window at most once), so the two-index sweep is O( s ), not O( s ^2). There is no inner re-scan and no hidden sort.
  • Space: O( s + t ). The need Counter holds at most one entry per distinct character it ever touches, which is bounded by the characters of t plus the characters of s it decrements. If you prefer to think of it as bounded by the alphabet size, that is a constant; the s + t bound is the safe, general statement to give an interviewer.

Your work this week

  1. Implement Minimum Window Substring yourself, in your own work repo (for example minimum-window-substring.py). Build the need Counter, track missing, expand right, shrink left while valid, and extract the best window with the empty-string fallback. Do not copy a solution; build it from the have-vs-need machinery above.
  2. Run it against the provided examples in practice/-style local tests, then submit to LeetCode’s judge. Test the edge cases yourself: t longer than s (answer ""), t with a repeated character (the multiplicity must be respected), and s == t.
  3. Commit with a five-question debrief in the message. In answer 3, state the exact complexity above, O( s + t ) time and O( s + t ) space, and explain why left advancing inside a while does not make the sweep quadratic.
When you can explain, from memory and with no AI open, what missing counts, why you test need[c] > 0 before decrementing, and why the total work is O( s + t ) despite the inner while, you own this problem. That explanation is exactly what an interviewer will ask for, and the have-vs-need template is one you will reuse on any “window must contain these characters” question.