Build a Small Full-Text Index
Build a Small Full-Text Index
Section titled “Build a Small Full-Text Index”This tutorial walks through the shipped std.text.index surface. You will build
a small in-memory index, query a shared term, and compare BM25-class scores.
Time: 10 minutes
Level: Intermediate
Prerequisites: use imports, slices, arrays, and basic u32 counters.
Create the Index
Section titled “Create the Index”Import the module with an alias and create the caller-owned index value.
use std.text.index as idx
pub func main() -> i32 do var index = idx.make_index() return 0endmake_index returns the whole Index value. There is no allocator parameter
because the current implementation is bounded and stores its arrays inside the
struct.
Add Documents
Section titled “Add Documents”Each document gets a caller-assigned u32 ID.
const doc0_text: []u8 = "sovereign identity mesh"const doc1_text: []u8 = "mesh network protocol"
var doc0_id: u32 = 0var doc1_id: u32 = 0
if idx.add_doc(&index, 0, doc0_text, &doc0_id) == false do return 1end
if idx.add_doc(&index, 1, doc1_text, &doc1_id) == false do return 2endadd_doc lowercases ASCII letters, splits on whitespace and common
punctuation, records the document token count, and appends one posting per
distinct term in the document.
Query a Term
Section titled “Query a Term”query_term writes document IDs into a caller-provided buffer.
var hits: [idx.MAX_POSTINGS_PER_TERM]u32 = .undefinedvar hit_count: u32 = 0
if idx.query_term(&index, "mesh", hits[0..], &hit_count) == false do return 3end
if hit_count != 2 do return 4endThe query term is normalized the same way indexed terms are normalized. The result order is the posting-list order, so code that cares about ranking should score the hits explicitly.
Score Matches
Section titled “Score Matches”Use bm25_score when term frequency and document length should affect ranking.
let score0 = idx.bm25_score(&index, 0, "mesh")let score1 = idx.bm25_score(&index, 1, "mesh")
if score0 == 0 do return 5endif score1 == 0 do return 6endScores are fixed-point integers scaled by 1000. The exact value is an
implementation detail; the stable contract is that missing documents, missing
terms, and zero term-frequency return 0.
Inspect Corpus Stats
Section titled “Inspect Corpus Stats”const stats = idx.bm25_stats(&index)if stats.doc_count != 2 do return 7endtotal_token_len is the sum of all token counts and feeds the average document
length used by BM25 normalization.
Complete Program
Section titled “Complete Program”use std.text.index as idx
pub func main() -> i32 do var index = idx.make_index()
const doc0_text: []u8 = "sovereign identity mesh" const doc1_text: []u8 = "mesh network protocol"
var doc0_id: u32 = 0 var doc1_id: u32 = 0
if idx.add_doc(&index, 0, doc0_text, &doc0_id) == false do return 1 end if idx.add_doc(&index, 1, doc1_text, &doc1_id) == false do return 2 end
var hits: [idx.MAX_POSTINGS_PER_TERM]u32 = .undefined var hit_count: u32 = 0 if idx.query_term(&index, "mesh", hits[0..], &hit_count) == false do return 3 end if hit_count != 2 do return 4 end
let score0 = idx.bm25_score(&index, 0, "mesh") let score1 = idx.bm25_score(&index, 1, "mesh") if score0 == 0 do return 5; end if score1 == 0 do return 6; end
const stats = idx.bm25_stats(&index) if stats.doc_count != 2 do return 7; end
return 0endRun the repository smoke for the canonical version:
cd janus./scripts/zb test-text-indexBoundary
Section titled “Boundary”Do not treat this module as persistent search. The current index is an in-memory core with fixed capacities. Store-backed keying, reopen behavior, and ASTDB query integration belong in a later facade.