Skip to content

Build a Small Full-Text Index

This tutorial walks through the shipped std.text.index surface. You will build a small in-memory index, query a shared term, and compare BM25-class scores.

Time: 10 minutes Level: Intermediate Prerequisites: use imports, slices, arrays, and basic u32 counters.

Import the module with an alias and create the caller-owned index value.

use std.text.index as idx
pub func main() -> i32 do
var index = idx.make_index()
return 0
end

make_index returns the whole Index value. There is no allocator parameter because the current implementation is bounded and stores its arrays inside the struct.

Each document gets a caller-assigned u32 ID.

const doc0_text: []u8 = "sovereign identity mesh"
const doc1_text: []u8 = "mesh network protocol"
var doc0_id: u32 = 0
var doc1_id: u32 = 0
if idx.add_doc(&index, 0, doc0_text, &doc0_id) == false do
return 1
end
if idx.add_doc(&index, 1, doc1_text, &doc1_id) == false do
return 2
end

add_doc lowercases ASCII letters, splits on whitespace and common punctuation, records the document token count, and appends one posting per distinct term in the document.

query_term writes document IDs into a caller-provided buffer.

var hits: [idx.MAX_POSTINGS_PER_TERM]u32 = .undefined
var hit_count: u32 = 0
if idx.query_term(&index, "mesh", hits[0..], &hit_count) == false do
return 3
end
if hit_count != 2 do
return 4
end

The query term is normalized the same way indexed terms are normalized. The result order is the posting-list order, so code that cares about ranking should score the hits explicitly.

Use bm25_score when term frequency and document length should affect ranking.

let score0 = idx.bm25_score(&index, 0, "mesh")
let score1 = idx.bm25_score(&index, 1, "mesh")
if score0 == 0 do
return 5
end
if score1 == 0 do
return 6
end

Scores are fixed-point integers scaled by 1000. The exact value is an implementation detail; the stable contract is that missing documents, missing terms, and zero term-frequency return 0.

const stats = idx.bm25_stats(&index)
if stats.doc_count != 2 do
return 7
end

total_token_len is the sum of all token counts and feeds the average document length used by BM25 normalization.

use std.text.index as idx
pub func main() -> i32 do
var index = idx.make_index()
const doc0_text: []u8 = "sovereign identity mesh"
const doc1_text: []u8 = "mesh network protocol"
var doc0_id: u32 = 0
var doc1_id: u32 = 0
if idx.add_doc(&index, 0, doc0_text, &doc0_id) == false do
return 1
end
if idx.add_doc(&index, 1, doc1_text, &doc1_id) == false do
return 2
end
var hits: [idx.MAX_POSTINGS_PER_TERM]u32 = .undefined
var hit_count: u32 = 0
if idx.query_term(&index, "mesh", hits[0..], &hit_count) == false do
return 3
end
if hit_count != 2 do
return 4
end
let score0 = idx.bm25_score(&index, 0, "mesh")
let score1 = idx.bm25_score(&index, 1, "mesh")
if score0 == 0 do return 5; end
if score1 == 0 do return 6; end
const stats = idx.bm25_stats(&index)
if stats.doc_count != 2 do return 7; end
return 0
end

Run the repository smoke for the canonical version:

Terminal window
cd janus
./scripts/zb test-text-index

Do not treat this module as persistent search. The current index is an in-memory core with fixed capacities. Store-backed keying, reopen behavior, and ASTDB query integration belong in a later facade.