Regex for Beginners: Essential Patterns Explained
Regex looks scary until it suddenly clicks. Then you feel like you have a superpower — and occasionally shoot yourself in the foot with it.
This is the regex tutorial I wish I'd had starting out: focused on the patterns that actually come up in day-to-day development, with real examples instead of abstract syntax charts. If you want to follow along, open the Regex Tester in another tab.
The basic building blocks
Before patterns make sense, you need to know what the individual characters mean.
Literals are the simplest case. The pattern cat matches the string "cat" anywhere in your text. No special behavior — just a character-for-character match.
Metacharacters are the special ones. These do something:
.— any single character except a newline*— zero or more of the preceding character+— one or more of the preceding character?— zero or one (makes something optional)^— start of string (or line, depending on flags)$— end of string (or line)\— escape the next character, or start a shorthand
The escaping matters a lot. If you want to match a literal dot (like in a file extension), you write \. not .. The . without a backslash matches any character, which is almost never what you want when matching file extensions.
Character classes
Square brackets let you define a set of allowed characters:
[aeiou]— matches any single vowel[a-z]— any lowercase letter[A-Za-z0-9]— any letter or digit[^aeiou]— any character that is NOT a vowel (the^inside brackets means "not")
Shorthands cover the common cases:
\d— any digit (same as[0-9])\w— any word character (letters, digits, underscore)\s— any whitespace (space, tab, newline)\D,\W,\S— the uppercase versions mean "not" that class
So \d{4} means "exactly four digits" and \w+ means "one or more word characters."
Quantifiers
Quantifiers control how many times something repeats:
{3}— exactly 3 times{2,5}— between 2 and 5 times{3,}— 3 or more times
A phone number like 555-123-4567 could be matched by:
\d{3}-\d{3}-\d{4}
That's three digits, a hyphen, three more digits, a hyphen, four digits. Try pasting that into the Regex Tester and testing it against a few phone numbers.
Groups and capturing
Parentheses group things together and also capture what they matched:
(\d{4})-(\d{2})-(\d{2})
Applied to "2026-03-04", this captures:
- Group 1:
2026 - Group 2:
03 - Group 3:
04
In JavaScript, you'd access these as match[1], match[2], match[3]. Very useful when you need to extract parts of a structured string.
If you want grouping without capturing (for performance or clarity), use (?:...):
(?:https?|ftp)://
This matches http://, https://, or ftp:// as a group but doesn't capture it separately.
Alternation
The pipe | means "or":
cat|dog
Matches "cat" or "dog". You can chain as many as you want. Combined with groups:
(jpg|jpeg|png|gif|webp)$
This matches common image file extensions at the end of a string.
Anchors: start and end
^ and $ are anchors that pin your pattern to a position:
^\d+$— the entire string must be digits (nothing else)^https?://— string must start with http:// or https://\.pdf$— string must end with .pdf
Without anchors, the pattern can match anywhere in the string. That's sometimes what you want (searching inside text) and sometimes what you don't (validating that a whole field contains only a phone number).
The five patterns you'll actually use
1. Email validation
There's no perfect email regex — the full spec is genuinely complex. But this covers 99% of real addresses:
^[^\s@]+@[^\s@]+\.[^\s@]+$
It says: one or more characters that aren't whitespace or @, then @, then the same again, then a dot, then the same again. It catches obvious failures like missing @ or missing domain, without getting into the weeds of RFC 5321.
If you need airtight email validation, send a confirmation email. Regex can only catch formatting errors.
2. URL detection
https?:\/\/[^\s]+
Matches http:// or https:// followed by anything that isn't whitespace. Good for finding URLs in a block of text. If you need to parse URL components (protocol, host, path, query), use your language's URL parsing library instead — that's what it's for. Keep in mind that URLs often contain percent-encoded characters (%20, %2F, etc.) — if you're processing the matched URL further, you'll want to understand URL encoding.
3. IP address (IPv4)
\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b
Matches four groups of 1-3 digits separated by dots. This will match 999.999.999.999, which isn't a valid IP — regex isn't the right tool for checking if numbers are in range. Use it to find IP-shaped strings, then validate the values in code.
4. Extracting numbers from text
-?\d+(\.\d+)?
Matches integers and decimals, including negative numbers. Useful for scraping numeric data from strings like "The file is 12.5 MB" or "Temperature: -3°C".
5. Stripping HTML tags
<[^>]+>
Matches <anything> where "anything" doesn't include >. Combined with a replace, it strips tags from HTML. This is a quick-and-dirty approach that works for simple cases. Don't use regex to parse HTML in production code — use a proper parser. But for cleaning up known-clean content in a script? Fine.
Flags that change behavior
Most regex engines support flags that modify how matching works:
g(global) — find all matches, not just the firsti(case-insensitive) — treat upper and lowercase as equalm(multiline) —^and$match line boundaries, not just string boundariess(dotall) — make.match newlines too
In JavaScript: /pattern/gi. In Python: re.compile("pattern", re.IGNORECASE | re.DOTALL).
The m flag trips people up. If you're matching multi-line text and ^/$ aren't working right, it's often because you need m.
Common mistakes
Forgetting to escape dots. 3.14 as a regex matches "3x14", "3.14", "3a14" — the dot matches anything. Write 3\.14 if you mean a literal dot.
Greedy vs. lazy matching. By default, quantifiers are greedy — they match as much as possible. <.+> applied to <b>bold</b> matches the whole thing, not just <b>. Add ? to make it lazy: <.+?> matches <b> and then </b> separately.
Catastrophic backtracking. Some patterns can cause exponential slowdown on certain inputs. If you're running regex on user-supplied text at scale, test it against adversarial inputs. Patterns like (a+)+ are classic examples of what not to do.
Using regex when a library would be cleaner. For dates, use a date library. For JSON, use JSON.parse(). For CSV, use a CSV parser. Regex solves "find/match/extract from text" problems. It doesn't solve "understand the structure of" problems.
Lookaheads and lookbehinds (a brief intro)
Once you're comfortable with the basics, lookarounds let you match based on context without including that context in the match:
\d+(?= dollars)— matches digits only if followed by " dollars"(?<=\$)\d+— matches digits only if preceded by$\d+(?! dollars)— matches digits NOT followed by " dollars"
These are useful for data extraction where the surrounding context identifies what you want but you don't want the context in your result.
Testing your patterns
Always test regex against a variety of inputs before trusting it:
- A normal input that should match
- An edge case (empty string, minimum valid length, maximum)
- An input that's almost right but should fail
- Completely wrong input
The Regex Tester lets you paste in a pattern and test multiple strings at once, with highlighting to see exactly what matched. Much faster than running code every time you tweak a pattern.
Wrapping up
Regex is a tool with a specific purpose: finding, matching, and extracting text patterns. Learn the basics — character classes, quantifiers, groups, anchors — and you can handle most of what comes up in real development. For everything else, know when to reach for a proper parser instead.
Start with the five patterns above, test them in the Regex Tester, and you'll be faster with regex within a week. The mental model clicks faster than most people expect.
Frequently asked questions
What's the difference between .* and .+?
.* matches zero or more characters — it can match an empty string. .+ matches one or more, so it requires at least one character. In practice, .+ is what you usually want when you expect actual content.
Why doesn't my regex work in Python if it works in JavaScript?
Regex syntax is mostly compatible across languages, but there are differences. Python's re module doesn't support some JavaScript syntax, and vice versa. The biggest common gotcha: in Python raw strings (r"pattern"), backslashes aren't treated as escape characters, so you can write \d instead of \\d. Always use raw strings for regex in Python.
Is regex case-sensitive by default?
Yes. cat won't match "Cat". Add the i flag (/cat/i in JavaScript, re.IGNORECASE in Python) for case-insensitive matching.
Can I use regex to parse HTML?
For simple cases, yes — stripping tags, finding specific patterns. For anything structural (navigating nested elements, extracting content from arbitrary HTML), use a proper parser like BeautifulSoup (Python) or DOMParser (browser JS). Regex doesn't understand nesting.
What does \b mean?
It's a word boundary — the position between a word character (\w) and a non-word character. \bcat\b matches "cat" and "cat." but not "concatenate". Useful when you want whole-word matches.
Try the tool mentioned in this article:
Open Tool →