std.text.stream
std.text.stream
Section titled “std.text.stream”Streams are not just iterators. They are the connective tissue between
$-family ergonomics and real-world data.
std.text.stream provides the text processing algebra that makes :script the replacement for awk, sed, and Unix pipelines — with type safety, fusion, and provenance.
Quick Example
Section titled “Quick Example”<<p"access.log">> |> grep(r/ERROR/) |> map($_.field(5)) |> unique() |> count() |> println()Compiles to a single loop, zero allocations. That’s the power of TextStream.
Core Types
Section titled “Core Types”Line and Provenance
Section titled “Line and Provenance”struct Provenance { path: Path, # Source file path line: u64, # 1-indexed line number byte_offset: u64,}
struct Line { text: String, provenance: Provenance,}Every line carries its origin. Always.
TextStream[T]
Section titled “TextStream[T]”let stream: TextStream[Line] := <<p"access.log">>let fields: TextStream[(String, String)] := stream |> map($_.fields())A typed iterator over text records.
Stream Literals
Section titled “Stream Literals”# From glob<<p"/var/log/*.log">>
# From file<<p"server.log">>
# From string<<"line one\nline two">>
# From process<<`tail -f /var/log/syslog`>>
# From stdin<<stdin>><<expr>> desugars to TextStream.from(expr).
Operators
Section titled “Operators”Filtering
Section titled “Filtering”<<p"log">> |> grep(r/ERROR/) # Keep matching<<p"log">> |> grep_v(r/DEBUG/) # Exclude matchingField Extraction
Section titled “Field Extraction”<<p"data.csv">> |> map($_.field(1)) # 1-indexed field<<p"data.csv">> |> map($_.fields()) # All fields as tupleTransform
Section titled “Transform”<<p"log">> |> map($_.to_upper()) # Transform each<<p"log">> |> filter($_.len() > 10) # Filter by predicate<<p"log">> |> filter_map($_.parse()?) # Parse-or-dropAggregation (Terminal)
Section titled “Aggregation (Terminal)”<<p"log">> |> sort() # Sort all<<p"log">> |> unique() # Remove duplicates<<p"log">> |> count() # Count elements<<p"log">> |> group_by($_.field(0)) # Group by keyParallel
Section titled “Parallel”# Ordered (default) — preserves logical order<<p"large.log">> |> grep(r/pattern/) |> parallel(4) |> count()
# Unordered — faster but nondeterministic<<p"large.log">> |> grep(r/pattern/) |> parallel_unordered(4) |> count()Provenance
Section titled “Provenance”Every operator preserves or transforms provenance:
| Operator | Provenance |
|---|---|
| map, filter, grep | ✅ Preserved |
| lines, window | ✅ Preserved (shifted) |
| sort, unique, count | ❌ Dropped |
| group_by | ⚠️ Summary |
# Print with source location<<p"*.log">> |> grep(r/EXCEPTION/) |> map("${$_.provenance.path}:${$_.provenance.line}: $_") |> for_each(println)Fusion
Section titled “Fusion”Chained operators fuse into a single pass:
<<p"log">> |> grep(r/ERROR/) |> map($_.field(5)) |> unique() |> count()Compiles to one loop, zero allocations. The guarantee is semantic — observable results are identical whether fused or not.
Integration with $-family
Section titled “Integration with $-family”<<p"access.log">> |> grep(r/ERROR/) |> map($_.field(5)) # $1 refers to field(5) |> filter($1.len() > 10) # $1 in filter closure |> count()| Sigil | Works On | Meaning |
|---|---|---|
$_ | Line | Current element |
$1, $2 | Fields | Positional field |
$# | Index | Stream index |
Comparison
Section titled “Comparison”| Feature | Janus TextStream | awk/sed | Ruby | Python |
|---|---|---|---|---|
| Type safety | ✅ Static | ❌ | ⚠️ | ⚠️ |
| Provenance | ✅ Default ON | ❌ | ❌ | ❌ |
| Fusion | ✅ Compile-time | ❌ | ❌ | ❌ |
$1 ergonomics | ✅ | ✅ | ❌ | ❌ |
| Pipeline operator | ✅ |> | ✅ | ✅ | ✅ |
| Parallel | ✅ Ordered default | ❌ | ⚠️ | ⚠️ |
Next Steps
Section titled “Next Steps”- Regex — Pattern matching
- PEG — Complex grammars
- Script Profile — Using streams in pipelines
Provenance by default. Fusion by eligibility. Ordered by default.