Regex Tester — proving a pattern matches what you think it matches

A walkthrough of PortJar's Regex Tester — how to iterate on a JavaScript-flavored pattern, read capture groups, and avoid the three common mistakes that ship broken regexes to production.

Half of the regexes that go to production were written by someone who tested them against three lines and called it a day. The other half were written against thirty lines and failed on the thirty-first — the URL with a port number, the email with a + tag, the IPv4 with an extra octet. A regex tester is the only practical way to iterate against real input without redeploying. PortJar’s Regex Tester is for the moment you stop trusting that the pattern in the open ticket actually does what its author thought it did.

What the tool does

Regex Tester is a live tester for JavaScript-flavored (ECMAScript) regular expressions. You enter a pattern, optionally a set of flags (g, i, m, s, u, y), and a body of test text. The tool returns every match in the text, the index where each match starts, and the contents of each capture group. Example chips load common starting patterns — email, IPv4, URL, UUID, hex color, ISO date — so you do not have to rewrite the same baseline regex from scratch every time.

The engine is the same one your browser ships, which means everything modern JavaScript supports works: lookbehind, named capture groups ((?<name>…)), Unicode property escapes (\p{…}). PCRE-only features — atomic groups, recursion, conditionals — do not. If you are writing a pattern for grep -P, pcregrep, or a server-side language using PCRE (PHP preg_*, older nginx modules), validate it in a PCRE-aware tool too; the syntax overlaps but the semantics diverge.

How to use it

Open portjar.com/tools/regex. Paste your pattern in the first field, set flags if you need them, paste a representative sample of the text you want to match in the third field, and hit Run. The output shows each match with its start index and capture groups. Use the example chips below the form to load a baseline you can modify rather than re-typing.

With the g flag set, the tool returns every match and caps the result list at 500 to keep the output readable — enough to validate against a representative log sample, not so many that a runaway pattern locks up the page. Without g, you get one match, the exec-style behavior; useful when you only care that a pattern matches at all and what it captured.

When you’d reach for it

  • Validating a log-parsing regex against a real log sample before deploying it to a Filebeat, Vector, or Logstash pipeline. The pattern that worked on the developer’s three example lines will fail on the operations log’s edge cases — IPv6 addresses, multi-line stack traces, lines with quoted strings containing escaped quotes.
  • Building an email or URL validator without writing one from scratch. The example chips give you a working starting point; iterate from there rather than from RFC 5322 in full.
  • Checking what a third-party rule actually matches — an Nginx location block, a WAF rule, an alerting filter — by pasting representative request lines and watching whether the pattern catches them all or misses some.
  • Diagnosing a “this alert fires too often” or “this alert never fires” report, where the underlying matcher is a regex. Paste the pattern, paste a window of real log lines, and the over-matches or under-matches become visible immediately.
  • Inferring what a capture group is feeding downstream. If a pattern is supposed to extract a request ID, hostname, or status code, run it against a sample and read the capture group output — the alert template or dashboard label is usually wrong by the time it gets that far.

What to make of the output

A pattern that returns the matches you expected, in the positions you expected, with the capture groups containing what you expected, is doing its job. Save it and ship it.

A pattern that returns no matches when you expected some is almost always one of three things: case sensitivity (try the i flag), anchors (^ and $ behave differently with the m flag against multi-line input), or character-class assumptions (\d matches Unicode digits with the u flag, ASCII-only without). Add or remove flags one at a time before changing the pattern itself.

A pattern that returns more matches than you expected has usually fallen into one of two traps: greedy quantifiers (.* will swallow everything between the first and last delimiter, not the first and nearest), or overly permissive character classes ([^"]* will happily cross newlines unless you also exclude \n). Switch to non-greedy quantifiers (.*?) or tighten the character class.

A pattern that takes a noticeable amount of time on a small input is a warning. The combination of nested quantifiers and backtracking can produce exponential runtimes on inputs that look harmless (“ReDoS” — regex denial of service). The classic shape is (a+)+b against aaaaaaaaaaaaaaaaaac. If the tool slows or hangs on a realistic log line, the same pattern in a production pipeline will eventually take down the worker that runs it. Rewrite to avoid nested quantifiers and unbounded repetition over the same character class.

Capture group output is the part most people skip. If a downstream system is expecting group 1 to be a hostname and group 2 to be a status code, run the pattern against a real line and read the groups in the order they appear. A capture group that comes back empty when the match succeeded is almost always an alternation problem — (foo|bar)(baz)? will leave group 2 empty when there is no baz, and any code that assumes group 2 is populated will misbehave.

Stack Harbor uses regex testing during log-pipeline tuning, alert rule reviews, and ingest schema work we run as part of monitoring and support.

Book consult