Skip to content

std.text.regex

Regex is the knife. PEG is the calligraphy brush.

std.text.regex provides compile-time validated regex literals with typed capture groups. It enables the $-family positional access in :script ($1, $2, etc.) and integrates with the pipeline algebra.


let pattern := r/(\d{4})-(?<month>\d{2})-(?<day>\d{2})/
if let Some(m) := pattern.match("2024-03-15") then
# Positional: $1 = "2024"
# Named: m.month = "03", m.day = "15"
print("Year: $1, Month: ${m.month}")
end

The Janus philosophy: Regex is a bounded tactical DSL. Elegance belongs to PEG.

  • Use regex when the pattern is short and local
  • Use PEG when the pattern has names, structure, or meaning
  • If a regex needs more than a few captures, it has probably become PEG-shaped

Invalid regex syntax produces a compile error, not a runtime crash:

# This won't compileinvalid syntax caught at compile time
let bad := r/[/

Captures are typed at compile time based on the pattern:

# No capturesunit type
let simple: Regex[()] := r/^hello$/
# Positional capturestuple
let nums: Regex[(u64, u64)] := r/(\d+)-(\d+)/
# Named capturesstruct
let date: Regex[Match { year: u64, month: u64 }] := r/(?<year>\d{4})-(?<month>\d{2})/

The engine uses Thompson’s NFA construction with O(n) worst-case performance. No backtracking. No ReDoS vulnerabilities.

Unicode support is enabled by default. The engine correctly handles Unicode code points and grapheme clusters.


ConstructExampleDescription
LiteralabcMatch exact string
Any.Any character (except newline)
Character class[a-z], [^0-9]Set or range
Alternationa|bEither/or
Anchor^, $Start/end of string
Word boundary\bWord boundary
Digit\d, \DDigit/non-digit
Positional capture(pattern)Capture → $1, $2
Named capture(?<name>pattern)Capture → struct field
Non-capturing(?:pattern)Group without capture
Repetition*, +, ?, {n,m}Repeat operators
  • Lookahead/lookbehind
  • Backreferences
  • Conditional patterns
  • PCRE extensions

# Check if matches
pattern.is_match(text) -> bool
# Find first match
pattern.match(text) -> ?Match[T]
# Find all matches
pattern.find_all(text) -> Iterator[Match[T]]
# Replace
pattern.replace(text, "replacement") -> String
# Split
pattern.split(text) -> Iterator[String]
if let Some(m) := r/(\d+)-(\w+)/.match("123-abc") then
# Positional via $1, $2 (in :script)
let num := $1
let word := $2
# Or direct tuple access
let num2 := m.0
let word2 := m.1
end
if let Some(m) := r/(?<y>\d{4})-(?<m>\d{2})/.match("2024-03") then
# Named capture access
let year := m.year
let month := m.month
end

The regex module integrates with the :script pipeline algebra:

<<p"access.log">>
|> grep(r/ERROR.*(?<code>\d+)/) # Filter lines with ERROR
|> map($_.code) # Extract named capture "code"
|> filter($1.parse_int()? > 500) # Filter by positional $1
|> unique()
|> for_each(println)

func is_valid_email(email: String) -> bool do
let pattern := r/^[\w.-]+@[\w.-]+\.\w{2,}$/
return pattern.is_match(email)
end
func parse_iso_date(line: String) -> ?(u64, u64, u64) do
let pattern := r/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/
match pattern.match(line) do
| .some(m) => return .some((m.year?, m.month?, m.day?))
| .none => return .none
end
end
<<p"server.log">>
|> grep(r/\[(?<level>\w+)\]/)
|> map($_.level)
|> filter($1 == "ERROR")
|> count()
|> println()
func mask_ssn(text: String) -> String do
let pattern := r/\d{3}-\d{2}-\d{4}/
return pattern.replace(text, "XXX-XX-XXXX")
end

FeatureJanus regexPython reJavaScriptPerl
Compile-time validation
Typed captures
Linear-time guarantee⚠️
$1, $2 in pipelines
PEG alternative


Elegance belongs to PEG. Regex is the knife.