Skip to content

std.text.stream

Streams are not just iterators. They are the connective tissue between $-family ergonomics and real-world data.

std.text.stream provides the text processing algebra that makes :script the replacement for awk, sed, and Unix pipelines — with type safety, fusion, and provenance.


<<p"access.log">>
|> grep(r/ERROR/)
|> map($_.field(5))
|> unique()
|> count()
|> println()

Compiles to a single loop, zero allocations. That’s the power of TextStream.


struct Provenance {
path: Path, # Source file path
line: u64, # 1-indexed line number
byte_offset: u64,
}
struct Line {
text: String,
provenance: Provenance,
}

Every line carries its origin. Always.

let stream: TextStream[Line] := <<p"access.log">>
let fields: TextStream[(String, String)] := stream |> map($_.fields())

A typed iterator over text records.


# From glob
<<p"/var/log/*.log">>
# From file
<<p"server.log">>
# From string
<<"line one\nline two">>
# From process
<<`tail -f /var/log/syslog`>>
# From stdin
<<stdin>>

<<expr>> desugars to TextStream.from(expr).


<<p"log">> |> grep(r/ERROR/) # Keep matching
<<p"log">> |> grep_v(r/DEBUG/) # Exclude matching
<<p"data.csv">> |> map($_.field(1)) # 1-indexed field
<<p"data.csv">> |> map($_.fields()) # All fields as tuple
<<p"log">> |> map($_.to_upper()) # Transform each
<<p"log">> |> filter($_.len() > 10) # Filter by predicate
<<p"log">> |> filter_map($_.parse()?) # Parse-or-drop
<<p"log">> |> sort() # Sort all
<<p"log">> |> unique() # Remove duplicates
<<p"log">> |> count() # Count elements
<<p"log">> |> group_by($_.field(0)) # Group by key
# Ordered (default) — preserves logical order
<<p"large.log">> |> grep(r/pattern/) |> parallel(4) |> count()
# Unorderedfaster but nondeterministic
<<p"large.log">> |> grep(r/pattern/) |> parallel_unordered(4) |> count()

Every operator preserves or transforms provenance:

OperatorProvenance
map, filter, grep✅ Preserved
lines, window✅ Preserved (shifted)
sort, unique, count❌ Dropped
group_by⚠️ Summary
# Print with source location
<<p"*.log">> |> grep(r/EXCEPTION/)
|> map("${$_.provenance.path}:${$_.provenance.line}: $_")
|> for_each(println)

Chained operators fuse into a single pass:

<<p"log">>
|> grep(r/ERROR/)
|> map($_.field(5))
|> unique()
|> count()

Compiles to one loop, zero allocations. The guarantee is semantic — observable results are identical whether fused or not.


<<p"access.log">>
|> grep(r/ERROR/)
|> map($_.field(5)) # $1 refers to field(5)
|> filter($1.len() > 10) # $1 in filter closure
|> count()
SigilWorks OnMeaning
$_LineCurrent element
$1, $2FieldsPositional field
$#IndexStream index

FeatureJanus TextStreamawk/sedRubyPython
Type safety✅ Static⚠️⚠️
Provenance✅ Default ON
Fusion✅ Compile-time
$1 ergonomics
Pipeline operator|>
Parallel✅ Ordered default⚠️⚠️


Provenance by default. Fusion by eligibility. Ordered by default.