std.text.rex
std.text.rex
Section titled “std.text.rex”Regex is the knife. PEG is the calligraphy brush.
std.text.rex is the regular-expression surface for the text stack. The
current v1 bridge supports local pattern matching through compile, isMatch,
find, replace, and replaceAll. SPEC-048 v1 is intentionally bounded:
typed captures and $ extraction remain future work.
The same bounded engine is exposed as the standalone rex tool for file
search. The tool defaults to natural query lowering and keeps raw regex syntax
one command deeper behind rex regex.
Quick Example
Section titled “Quick Example”use std.text.rex as rex
func main() -> i32 do const pattern = "\\d+" const input = "invoice 123" const h = rex.compile(pattern)
if h == 0 do return 1 end
if rex.isMatch(h, input) do return 0 end
return 2endWhen to Use It
Section titled “When to Use It”The Janus philosophy: Regex is a bounded tactical DSL. Elegance belongs to PEG.
- Use
std.text.rexwhen the pattern is short and local. - Use
std.text.pegwhen the pattern has names, structure, or meaning. - If a regex needs more than a few captures, it has probably become PEG-shaped.
Current Engine
Section titled “Current Engine”The shipped bridge is intentionally small. It stores compiled patterns behind opaque integer handles and matches byte strings through a simple NFA-style matcher.
Supported today:
- literals
.*,+,?- character classes such as
[a-z]and[^0-9] ^and$\d,\w, and\s- escaped literal punctuation such as
\(,\),\{,\}, and\|
Not complete yet:
- typed capture groups
- named captures
$1,$2, and$*extraction integration- parser-literal validation for
r/.../ - Unicode grapheme semantics
Those are future SPEC-048 amendments, not v1 guarantees.
func compile(pattern: []const u8) -> usizefunc isMatch(h: usize, input: []const u8) -> boolfunc find(h: usize, input: []const u8) -> *u8func capture(h: usize, input: []const u8, index: usize) -> *u8func captureCount(h: usize) -> usizefunc replace(h: usize, input: []const u8, replacement: []const u8) -> *u8func replaceAll(h: usize, input: []const u8, replacement: []const u8) -> *u8compile returns 0 when the bridge rejects the pattern. v1 rejects malformed
or unsupported patterns such as an unterminated character class, grouping, or
unsupported escapes like \D. Captures are exposed in the facade for forward
compatibility, but v1 does not extract captures:
captureCount returns 0 and capture returns null.
Typed Query Layer
Section titled “Typed Query Layer”When the CLI recipes reach their limit, move into the stdlib rather than
stretching command syntax. TextQuery keeps raw bounded rex and exact literals
separate:
const raw = rex.raw("\\d+")const exact = rex.literal("call(foo)")
const a = rex.evaluate(raw, "invoice 123")const b = rex.evaluate(exact, "x call(foo) y")The typed API is pure and capability-free:
pub enum QueryKind { raw, literal }
pub struct TextQuery { _handle: usize,}
pub struct MatchResult { valid: bool, matched: bool,}
pub func raw(view pattern: []const u8) -> TextQuerypub func literal(view text: []const u8) -> TextQuerypub func kindCode(view query: TextQuery) -> i64pub func textPtr(view query: TextQuery) -> usizepub func textLen(view query: TextQuery) -> usizepub func evaluate(view query: TextQuery, view input: []const u8) -> MatchResultpub func require(view query: TextQuery, view input: []const u8) -> RexError!MatchResultpub func select[T](view query: TextQuery, view input: []const u8, on_match: T, on_miss: T, on_invalid: T) -> TTextQuery is intentionally opaque. raw and literal copy the query text
into bridge-owned storage and return a compact handle. That keeps callers from
threading pointer/length details through code while still allowing
std.text.search to use the same query object for filesystem search.
evaluate reports invalid raw patterns with valid = false.
require uses RexError.InvalidPattern and RexError.NoMatch for control
flow. select[T] lets callers map match/miss/invalid states into their own
typed result without making rex own policy.
Tool Facet: rex
Section titled “Tool Facet: rex”rex searches files and directories with the same bounded matching
semantics. Its default mode accepts small natural phrases and lowers them into
rex patterns:
rex "contains TODO" stdrex "digits after invoice " logsrex "contains email address" srcrex "ip address" access.logrex "hex color" stylesrex "between start and end" app.logrex "starts with error and ends with 500" app.logNatural atoms include digits, word, email address, ip address, uuid,
hex color, and quoted text. Ordered phrases include A then B, A after B,
A before B, and between A and B.
Use explain to see the generated pattern before searching:
rex explain "digits after invoice "rex --explain "digits after invoice "rex explain --plan "digits after invoice "rex --explain --plan "digits after invoice "rex explain --plan prints the current typed-plan seam. The concrete
Rex.QueryAst and Rex.QueryPlan stdlib data model is future work, but the
command output already makes the lowering boundary inspectable.
Use syntax when you want the next layer of recipes and the bounded raw
grammar:
rex syntaxRaw regex, exact literal, and version modes are explicit second commands:
rex regex "\\d+" stdrex literal "call(foo)" srcrex versionrex --versionUseful flags:
--explainprints the generated rex pattern and exits--planwithexplainprints the lowered query/search plan seam--jsonemits one JSON object per match-cor--countprints only the match count-lor--files-with-matchesprints only matching file paths-nor--line-numberenables line numbers--no-line-numberand--no-filenamereduce text output
version, explain, --explain, --plan, --json, and --count write their
machine-consumable output to stdout. Diagnostics stay on stderr, so shell
pipelines and Janus std.command callers can capture the tool without shell
interpolation.
Options must appear before the query. Unknown options fail before search
starts, and a post-query flag such as rex "contains TODO" src --count is an
error rather than a path lookup. Use -- before a query that starts with -.
The command-line facet is built from Janus source in tools/rex/main.jan:
./scripts/zb build-rex./scripts/zb test-rex./scripts/zb test-text-rex also compiles a Janus smoke that captures
./zig-out/bin/rex through std.command, proving the command behaves like a
standalone tool facet.
The standalone examples/jrex.jan program also compiles as a tiny Janus
matcher and exercises the public facade plus the runtime bridge ABI.
Convenience API
Section titled “Convenience API”These helpers compile the pattern and immediately run one operation:
func matches(pattern: []const u8, input: []const u8) -> boolfunc findFirst(pattern: []const u8, input: []const u8) -> *u8func captureOne(pattern: []const u8, input: []const u8, index: usize) -> *u8func replaceOne(pattern: []const u8, input: []const u8, replacement: []const u8) -> *u8func replaceAllOne(pattern: []const u8, input: []const u8, replacement: []const u8) -> *u8Verification
Section titled “Verification”The repository gate for the shipped contract is:
cd janus./scripts/zb test-text-rex./scripts/zb test-rextest-text-rex includes the bridge smoke, the Janus facade smoke, and the
std.command capture smoke for the built rex executable.
Next Steps
Section titled “Next Steps”- PEG (SPEC-046) — For complex, readable parsing
- TextStream — Pipeline algebra
- Script Profile —
:scriptauto-imports and extraction syntax
Elegance belongs to PEG. Regex is the knife.