Regex greedy vs lazy: avoiding the classic traps
The only difference between .* and .*? is the trailing question mark, but the match results differ dramatically. Anyone using regex in real code runs into this — extracting HTML tags, parsing log fields, capturing string literals.
Quantifiers: * + ? {n,m}
A quantifier says how many times the preceding pattern can repeat:
| Quantifier | Meaning |
|---|---|
* | 0 or more |
+ | 1 or more |
? | 0 or 1 |
{n} | exactly n |
{n,} | n or more |
{n,m} | between n and m |
Each of these has both a greedy and a lazy variant.
Default is greedy
Quantifiers without a trailing ? are greedy — they consume as much as possible while still allowing the overall pattern to match.
Applying <.*> to <a><b>:
input: <a><b>
pattern: <.*>
match: <a><b> ← whole thing (longest) .* matches “any character, zero or more times”, and greedily takes everything up to the last >, so the captured .* content is a><b. The first time you write something like this, the result feels wrong — you wanted <a>.
Adding ? makes it lazy
A trailing ? flips the quantifier to lazy, which consumes as little as possible.
input: <a><b>
pattern: <.*?>
match: <a> ← minimum (stops at the next >)
<b> ← second match if you keep going .*? halts at the first >. For tag extraction, lazy quantifiers are typically what you want.
A backtracking lens
Greedy and lazy quantifiers are both implemented through backtracking, but the search direction is opposite.
Greedy
- Try the longest possible match first.
- If the rest of the pattern doesn’t match, shrink by one character and retry.
- Keep retreating until something works.
Lazy
- Try the shortest possible match first.
- If the rest of the pattern doesn’t match, extend by one character and retry.
- Keep advancing until something works.
Both backtrack; the direction differs, with associated performance characteristics.
Negated character classes are often a better tool
Where you’d reach for .*?, a negated character class can express the same intent more directly.
For HTML tag extraction:
lazy: <[^>]*> ← "anything that is not >, greedily"
non-greedy: <.*?> ← "anything, lazily, until the next >" Both produce <a> here, but the negated class is:
- More explicit — the constraint is right there.
- Faster — no backtracking needed; the match is unambiguous.
- More robust — survives weirder inputs.
Regex engines treat negated character classes deterministically, so the performance gap grows on long inputs.
Avoiding catastrophic backtracking
Nesting greedy quantifiers can blow up backtracking exponentially.
pattern: (a+)+b
input: aaaaaaaaaaaaaaaaaaaaaaaaaa (no b) The engine tries every way of distributing as across the inner (a+), retrying on each failure. The work approaches 2^N for input length N — the textbook ReDoS (Regex DoS) scenario, where dozens of characters can hang the engine for seconds or minutes.
Mitigations:
- Avoid nesting quantifiers (
a+, not(a+)+). - Possessive quantifiers like
a++(not in JavaScript; some other engines). - Atomic groups like
(?>a+)(Java, PCRE). - Use negated classes or specific characters to remove ambiguity.
JavaScript supports neither possessive quantifiers nor atomic groups, so prevention has to live in the pattern design.
The (.*) capture trap
Pulling JSON out of log lines:
input: request_id="abc-123" body="{"foo":1}" status=200
pattern: body="(.*)" Greedy (.*) runs all the way to the last " in the line:
captured: {"foo":1}" status=200 (wrong) Lazy version stops at the next ":
pattern: body="(.*?)"
captured: {"foo":1} (right) But (.*?) still breaks if the input has escaped quotes, e.g. body="abc \"escaped\" def". A more correct pattern:
pattern: body="((?:[^"\]|\.)*)" “any character that isn’t a " or \, or a \ followed by anything”. At this level, give up on regex and feed the value to a real JSON parser.
Rules of thumb
- Default to greedy. Add
?only when you actually want lazy. - For HTML/tag extraction, prefer negated character classes over lazy quantifiers, both for performance and correctness.
- Watch nested quantifiers — they are the home of ReDoS.
- Know when to stop. Carving values out of structured text (JSON, HTML) past a certain complexity is a job for a parser, not a regex.
When you want to see how a pattern behaves, the regex tester on this site shows match results live. Trying the same pattern with and without ? makes the greedy/lazy distinction immediately visible.