Skip to content

std.text.rex

Regex is the knife. PEG is the calligraphy brush.

std.text.rex is the regular-expression surface for the text stack. The current v1 bridge supports local pattern matching through compile, isMatch, find, replace, and replaceAll. SPEC-048 v1 is intentionally bounded: typed captures and $ extraction remain future work.

The same bounded engine is exposed as the standalone rex tool for file search. The tool defaults to natural query lowering and keeps raw regex syntax one command deeper behind rex regex.


use std.text.rex as rex
func main() -> i32 do
const pattern = "\\d+"
const input = "invoice 123"
const h = rex.compile(pattern)
if h == 0 do
return 1
end
if rex.isMatch(h, input) do
return 0
end
return 2
end

The Janus philosophy: Regex is a bounded tactical DSL. Elegance belongs to PEG.

  • Use std.text.rex when the pattern is short and local.
  • Use std.text.peg when the pattern has names, structure, or meaning.
  • If a regex needs more than a few captures, it has probably become PEG-shaped.

The shipped bridge is intentionally small. It stores compiled patterns behind opaque integer handles and matches byte strings through a simple NFA-style matcher.

Supported today:

  • literals
  • .
  • *, +, ?
  • character classes such as [a-z] and [^0-9]
  • ^ and $
  • \d, \w, and \s
  • escaped literal punctuation such as \(, \), \{, \}, and \|

Not complete yet:

  • typed capture groups
  • named captures
  • $1, $2, and $* extraction integration
  • parser-literal validation for r/.../
  • Unicode grapheme semantics

Those are future SPEC-048 amendments, not v1 guarantees.

func compile(pattern: []const u8) -> usize
func isMatch(h: usize, input: []const u8) -> bool
func find(h: usize, input: []const u8) -> *u8
func capture(h: usize, input: []const u8, index: usize) -> *u8
func captureCount(h: usize) -> usize
func replace(h: usize, input: []const u8, replacement: []const u8) -> *u8
func replaceAll(h: usize, input: []const u8, replacement: []const u8) -> *u8

compile returns 0 when the bridge rejects the pattern. v1 rejects malformed or unsupported patterns such as an unterminated character class, grouping, or unsupported escapes like \D. Captures are exposed in the facade for forward compatibility, but v1 does not extract captures: captureCount returns 0 and capture returns null.

When the CLI recipes reach their limit, move into the stdlib rather than stretching command syntax. TextQuery keeps raw bounded rex and exact literals separate:

const raw = rex.raw("\\d+")
const exact = rex.literal("call(foo)")
const a = rex.evaluate(raw, "invoice 123")
const b = rex.evaluate(exact, "x call(foo) y")

The typed API is pure and capability-free:

pub enum QueryKind { raw, literal }
pub struct TextQuery {
_handle: usize,
}
pub struct MatchResult {
valid: bool,
matched: bool,
}
pub func raw(view pattern: []const u8) -> TextQuery
pub func literal(view text: []const u8) -> TextQuery
pub func kindCode(view query: TextQuery) -> i64
pub func textPtr(view query: TextQuery) -> usize
pub func textLen(view query: TextQuery) -> usize
pub func evaluate(view query: TextQuery, view input: []const u8) -> MatchResult
pub func require(view query: TextQuery, view input: []const u8) -> RexError!MatchResult
pub func select[T](view query: TextQuery, view input: []const u8, on_match: T, on_miss: T, on_invalid: T) -> T

TextQuery is intentionally opaque. raw and literal copy the query text into bridge-owned storage and return a compact handle. That keeps callers from threading pointer/length details through code while still allowing std.text.search to use the same query object for filesystem search.

evaluate reports invalid raw patterns with valid = false. require uses RexError.InvalidPattern and RexError.NoMatch for control flow. select[T] lets callers map match/miss/invalid states into their own typed result without making rex own policy.

rex searches files and directories with the same bounded matching semantics. Its default mode accepts small natural phrases and lowers them into rex patterns:

Terminal window
rex "contains TODO" std
rex "digits after invoice " logs
rex "contains email address" src
rex "ip address" access.log
rex "hex color" styles
rex "between start and end" app.log
rex "starts with error and ends with 500" app.log

Natural atoms include digits, word, email address, ip address, uuid, hex color, and quoted text. Ordered phrases include A then B, A after B, A before B, and between A and B.

Use explain to see the generated pattern before searching:

Terminal window
rex explain "digits after invoice "
rex --explain "digits after invoice "
rex explain --plan "digits after invoice "
rex --explain --plan "digits after invoice "

rex explain --plan prints the current typed-plan seam. The concrete Rex.QueryAst and Rex.QueryPlan stdlib data model is future work, but the command output already makes the lowering boundary inspectable.

Use syntax when you want the next layer of recipes and the bounded raw grammar:

Terminal window
rex syntax

Raw regex, exact literal, and version modes are explicit second commands:

Terminal window
rex regex "\\d+" std
rex literal "call(foo)" src
rex version
rex --version

Useful flags:

  • --explain prints the generated rex pattern and exits
  • --plan with explain prints the lowered query/search plan seam
  • --json emits one JSON object per match
  • -c or --count prints only the match count
  • -l or --files-with-matches prints only matching file paths
  • -n or --line-number enables line numbers
  • --no-line-number and --no-filename reduce text output

version, explain, --explain, --plan, --json, and --count write their machine-consumable output to stdout. Diagnostics stay on stderr, so shell pipelines and Janus std.command callers can capture the tool without shell interpolation.

Options must appear before the query. Unknown options fail before search starts, and a post-query flag such as rex "contains TODO" src --count is an error rather than a path lookup. Use -- before a query that starts with -.

The command-line facet is built from Janus source in tools/rex/main.jan:

Terminal window
./scripts/zb build-rex
./scripts/zb test-rex

./scripts/zb test-text-rex also compiles a Janus smoke that captures ./zig-out/bin/rex through std.command, proving the command behaves like a standalone tool facet.

The standalone examples/jrex.jan program also compiles as a tiny Janus matcher and exercises the public facade plus the runtime bridge ABI.

These helpers compile the pattern and immediately run one operation:

func matches(pattern: []const u8, input: []const u8) -> bool
func findFirst(pattern: []const u8, input: []const u8) -> *u8
func captureOne(pattern: []const u8, input: []const u8, index: usize) -> *u8
func replaceOne(pattern: []const u8, input: []const u8, replacement: []const u8) -> *u8
func replaceAllOne(pattern: []const u8, input: []const u8, replacement: []const u8) -> *u8

The repository gate for the shipped contract is:

Terminal window
cd janus
./scripts/zb test-text-rex
./scripts/zb test-rex

test-text-rex includes the bridge smoke, the Janus facade smoke, and the std.command capture smoke for the built rex executable.


Elegance belongs to PEG. Regex is the knife.