Skip to content

std.compute.vector

std.compute.vector is the vector-kernel layer for the :compute profile. It is the scalar/reference surface for Janus vector search: pure Janus kernels, deterministic behavior, caller-owned query/result buffers, and explicit retained storage.

The positional TurboIndex surface is available for in-memory search over retained packed codes. IdMapIndex wraps it with stable external u64 IDs and allowlist-before-heap search. Header encode/decode for .jvi and .jvim is also available. Positional .jvi indexes can save/load packed-code payloads through caller-owned byte buffers or files, and stable-ID .jvim indexes can save/load packed-code payloads plus the external-ID sidecar through caller-owned byte buffers or files. Save computes BLAKE3 payload CIDs; load verifies them and rejects tampered payloads with VectorStatus.cid_mismatch. Fixed Lloyd-Max b2/b4 tables are used for scalar quantization. Target-feature-gated neon, avx2, and avx512 grafted scoring paths are available on matching build targets.

TurboQuant and TurboVec are first-class façade types now: TurboQuant makes quantization policy explicit (b2/b4), while TurboVec provides a stable-ID ANN API over the same IdMapIndex core behavior.

use std.compute.vector

The aggregate std.compute module also re-exports vector.

The shipped Phase 1 API covers:

AreaFunctions
Scalar metricsdot, norm_f32, l2, cosine, normalize_f32
Rotationrotate_f32, unrotate_f32
Bit packingpack_b4, unpack_b4, pack_b2, unpack_b2
Quantizationquantize_b4, dequantize_b4, quantize_vec_b4, dequantize_vec_b4, quantize_b2, dequantize_b2, quantize_vec_b2, dequantize_vec_b2, TurboQuant, TurboQuantConfig, turbo_quant_new, turbo_quant_quantize, turbo_quant_dequantize
Top-k heaptopk_offer, topk_finalize
Benchmark oracleexact_search, recall_at_k_milli
Positional indexindex_init, index_add, index_search, index_search_report, index_remove, index_deinit, index_backend
Stable-ID indexidmap_init, idmap_deinit, idmap_add_with_ids, idmap_remove, idmap_search, idmap_search_report, idmap_search_allowlist
TurboVec façadeTurboVec, turbo_vec_init, turbo_vec_deinit, turbo_vec_as_idmap, turbo_vec_add, turbo_vec_remove, turbo_vec_search, turbo_vec_search_allowlist, turbo_vec_search_report, turbo_vec_persisted_bytes, turbo_vec_save_bytes, turbo_vec_load_bytes, turbo_vec_save_file, turbo_vec_load_file
Search reportingSearchReport
PersistenceVectorHeader, VectorFileKind, encode_vector_header, decode_vector_header, index_persisted_bytes, index_save_bytes, index_load_bytes, index_save_file, index_load_file, idmap_persisted_bytes, idmap_save_bytes, idmap_load_bytes, idmap_save_file, idmap_load_file

Metrics, rotation, packing, quantization, and top-k operate on caller-owned buffers. TurboIndex retains packed-code storage allocated during index_init through the explicit parent allocator in IndexConfig. IdMapIndex adds one retained u64 sidecar for stable external IDs. Search calls do not allocate and write results into caller-owned buffers.

use std.compute.vector
func main() -> i32 do
let a = [_]f32{1.0, 2.0, 3.0}
let b = [_]f32{4.0, 5.0, 6.0}
if vector.dot(a, b, 3) != 32.0 do return 1 end
let v = [_]f32{3.0, 4.0}
if vector.norm_f32(v, 2) != 5.0 do return 2 end
let mut out = [_]f32{0.0, 0.0}
vector.normalize_f32(v, 2, out)
return 0
end

norm_f32 uses a local Babylonian square root implementation, not a C libm graft. That keeps the reference path sovereign and deterministic.

let mut embedding = [_]f32{0.5, -0.25, 0.75, 1.5}
vector.rotate_f32(embedding, 4, 12345, 4)
vector.unrotate_f32(embedding, 4, 12345, 4)

Rotation is seeded and reproducible. The implementation applies deterministic Householder reflections derived from the seed and round number. The transform is norm-preserving and storage-free: unrotate_f32 regenerates the same reflection sequence in reverse.

let codes = [_]u8{1, 2, 15, 0, 7, 8}
var packed: [3]u8 = undefined
var unpacked: [6]u8 = undefined
if vector.pack_b4(codes, 6, packed) != 3 do return 1 end
vector.unpack_b4(packed, 6, unpacked)

b4 packs two 4-bit codes per byte. b2 packs four 2-bit codes per byte. Tail bytes are zero-padded, and unpacking reads exactly the requested code count.

let qv = [_]f32{-0.5, 0.0, 0.5, 0.95}
var codes: [4]u8 = undefined
let mut decoded = [_]f32{0.0, 0.0, 0.0, 0.0}
vector.quantize_vec_b4(qv, 4, codes)
vector.dequantize_vec_b4(codes, 4, decoded)

The current quantizer uses fixed Lloyd-Max scalar tables for b4 and b2. quantize_b4 maps a scalar to one of 16 reconstruction levels; quantize_b2 maps to one of 4. The tables are deterministic compile-time constants, so the scalar reference stays training-free and reproducible.

TurboQuant exposes explicit quantization policy under a typed façade:

use std.compute.vector
var qcfg = vector.TurboQuantConfig { kind: vector.TurboQuantKind.b4 }
let q = vector.turbo_quant_new(&qcfg)
let vector_data = [_]f32{-0.5, 0.0, 0.5, 0.95}
var codes: [4]u8 = undefined
var decoded: [4]f32 = undefined
vector.turbo_quant_quantize(&q, vector_data, 4, codes)
vector.turbo_quant_dequantize(&q, codes, 4, decoded)
var scores: [3]f32 = undefined
var ids: [3]u64 = undefined
var len: u32 = 0
len = vector.topk_offer(scores, ids, len, 3, 0.5, 10)
len = vector.topk_offer(scores, ids, len, 3, 0.875, 20)
len = vector.topk_offer(scores, ids, len, 3, 0.125, 30)
len = vector.topk_offer(scores, ids, len, 3, 0.75, 40)
vector.topk_finalize(scores, ids, len)

The heap keeps the largest score values. Callers using L2 distance should pass -distance as the score key so “larger is better” remains true internally. After topk_finalize, the arrays are sorted in descending score order and are no longer a heap.

exact_search runs a full-precision top-k scan over caller-owned raw [count][dim]f32 storage. It is not an index; it is the correctness oracle for recall measurements and small benchmark harnesses.

let vectors = [_]f32{
1.0, 0.0,
0.0, 1.0,
0.75, 0.75,
}
let query = [_]f32{1.0, 0.0}
var exact_scores: [2]f32 = undefined
var exact_ids: [2]u64 = undefined
let exact_found = vector.exact_search(
vectors,
3,
2,
vector.Metric.cosine,
query,
2,
exact_scores,
exact_ids,
)

recall_at_k_milli compares approximate result IDs against exact baseline IDs as sets within the first k entries and returns milli-units: 1000 means recall 1.0, 500 means recall 0.5. The helper is allocation-free and order-insensitive; ranking quality beyond membership remains a benchmark-harness concern.

TurboIndex stores packed quantized vectors and returns positional slot IDs. Slots are not stable across removal: index_remove uses swap_remove, so the last vector can move into the removed position.

use std.compute.vector
use std.alloc.page_allocator
func main() -> i32 do
var pa = page_allocator.init()
var cfg = vector.IndexConfig {
dim: 2,
bit_width: vector.BitWidth.b4,
metric: vector.Metric.cosine,
backend: vector.Backend.cpu_auto,
seed: 99,
capacity: 4,
rounds: 0,
allocator: &pa,
}
var idx: vector.TurboIndex = undefined
if vector.index_init(&idx, &cfg) != vector.VectorStatus.ok do
return 1
end
let vectors = [_]f32{
1.0, 0.0,
0.0, 1.0,
0.75, 0.75,
}
if vector.index_add(&idx, vectors, 3) != vector.VectorStatus.ok do
vector.index_deinit(&idx)
return 2
end
let query = [_]f32{1.0, 0.0}
var scores: [2]f32 = undefined
var ids: [2]u64 = undefined
let found = vector.index_search(&idx, query, 2, scores, ids)
vector.index_deinit(&idx)
return 0
end

IndexConfig.capacity preallocates retained packed-code storage using IndexConfig.allocator. rounds controls the seeded Householder rotation count; 0 is valid for deterministic test cases that isolate index semantics. The smoke harness asserts that retained-storage accounting returns to zero after index_deinit, and that a second index_deinit call remains inert.

index_backend reports the selected backend. cpu_auto selects the best compiled target-supported grafted scorer in this order: avx512, avx2, neon, then scalar. Explicit neon, avx2, and avx512 indexes use grafted Zig vector-lane scoring kernels for dot/L2 candidate scoring when the compiled target advertises the matching feature; otherwise index_init returns VectorStatus.backend_unavailable. The smoke harness builds matching cpu_auto, explicit scalar, and target-available explicit SIMD indexes and asserts result count, IDs, and scores match, keeping accelerated backends tied to the scalar oracle.

SearchReport is the stable reporting payload for benchmark harnesses. index_search_report and idmap_search_report run the same search as their plain counterparts, then fill backend, metric, bit width, dimension, vector count, visited count, requested k, found count, code payload bytes, raw [dim]f32 payload bytes, and compression_ratio_milli. Pair these reports with exact_search and recall_at_k_milli to produce recall@k numbers against the scalar full-precision oracle. Latency remains caller-measured so the stdlib does not fabricate a clock abstraction.

IdMapIndex keeps external u64 IDs stable across deletion. Internally it uses the same positional packed-code store, then repairs the ID sidecar after swap_remove.

use std.compute.vector
use std.alloc.page_allocator
func main() -> i32 do
var pa = page_allocator.init()
var cfg = vector.IndexConfig {
dim: 2,
bit_width: vector.BitWidth.b4,
metric: vector.Metric.cosine,
backend: vector.Backend.scalar,
seed: 99,
capacity: 4,
rounds: 0,
allocator: &pa,
}
var idx: vector.IdMapIndex = undefined
if vector.idmap_init(&idx, &cfg) != vector.VectorStatus.ok do
return 1
end
let vectors = [_]f32{
1.0, 0.0,
0.0, 1.0,
0.75, 0.75,
}
let external_ids = [_]u64{101, 202, 303}
if vector.idmap_add_with_ids(&idx, vectors, external_ids, 3) != vector.VectorStatus.ok do
vector.idmap_deinit(&idx)
return 2
end
let query = [_]f32{0.75, 0.75}
let allow = [_]u64{101, 303}
var scores: [2]f32 = undefined
var ids: [2]u64 = undefined
let found = vector.idmap_search_allowlist(&idx, query, 2, allow, 2, scores, ids)
vector.idmap_deinit(&idx)
return 0
end

idmap_search and idmap_search_allowlist return external IDs, never positional slots. idmap_search_allowlist filters candidates before they are offered to the top-k heap, so a disallowed candidate cannot evict an allowed result. idmap_add_with_ids rejects duplicate external IDs with VectorStatus.duplicate_id; stable IDs are unique, not a multimap.

IdMapIndex stores the ID sidecar beside the inner positional index. Its lifecycle follows the same rule: initialize with idmap_init, release with idmap_deinit. The vector smoke harness verifies that both the inner index and the sidecar clear retained-storage accounting after idmap_deinit, and that a second idmap_deinit call does not free again.

TurboVec is a stable-ID first-class wrapper over IdMapIndex for ANN-style storage and query:

use std.compute.vector
use std.alloc.page_allocator
var pa = page_allocator.init()
var cfg = vector.IndexConfig {
dim: 2,
bit_width: vector.BitWidth.b4,
metric: vector.Metric.cosine,
backend: vector.Backend.scalar,
seed: 99,
capacity: 4,
rounds: 0,
allocator: &pa,
}
var tv: vector.TurboVec = undefined
if vector.turbo_vec_init(&tv, &cfg) != vector.VectorStatus.ok do
return 1
end
let vectors = [_]f32{
1.0, 0.0,
0.0, 1.0,
0.75, 0.75,
}
let ids = [_]u64{101, 202, 303}
vector.turbo_vec_add(&tv, vectors, ids, 3)
let query = [_]f32{0.75, 0.75}
var scores: [2]f32 = undefined
var result_ids: [2]u64 = undefined
let found = vector.turbo_vec_search(&tv, query, 2, scores, result_ids)
vector.turbo_vec_remove(&tv, 101)
vector.turbo_vec_deinit(&tv)

turbo_vec_search_allowlist preserves the existing early-filter behavior and turbo_vec_as_idmap exposes the underlying IdMapIndex for existing lower-level interop when needed.

.jvi is the positional vector-index format. .jvim is the stable-ID vector index map format; it carries the same base header plus idmap_payload_cid for the external-ID sidecar payload.

var header = vector.VectorHeader {
kind: vector.VectorFileKind.jvim,
version: vector.VECTOR_FORMAT_VERSION,
dim: 1536,
bit_width: vector.BitWidth.b4,
metric: vector.Metric.cosine,
backend_hint: vector.Backend.scalar,
vector_count: 42,
rotation_seed: 0x0123456789ABCDEF,
rotation_rounds: 6,
quantizer_kind: vector.VECTOR_QUANTIZER_TURBOQUANT,
code_payload_cid: code_cid,
norms_payload_cid: norms_cid,
idmap_payload_cid: idmap_cid,
}
var bytes: [144]u8 = undefined
let status = vector.encode_vector_header_raw(bytes, 144, &header)

VECTOR_HEADER_BYTES is 144. Multi-byte scalars are little-endian. The header includes rotation_rounds so loaded indexes reproduce the original rotation.

For positional indexes, index_persisted_bytes, index_save_bytes, and index_load_bytes save and reload the .jvi header plus packed-code payload in caller-owned byte buffers. index_save_bytes computes the packed-code payload CID from the serialized bytes; index_load_bytes recomputes and verifies it before allocating the reopened index. The byte APIs use explicit pointer+length arguments, for example index_save_bytes(&idx, bytes, cap) and index_load_bytes(&idx, &pa, bytes, len).

index_save_file and index_load_file provide the same .jvi contract over a filesystem path. File I/O failures return VectorStatus.io_error; malformed or wrong-kind contents still return format_mismatch, and CID mismatches still return cid_mismatch.

For stable-ID indexes, idmap_persisted_bytes, idmap_save_bytes, and idmap_load_bytes save and reload the .jvim header, packed-code payload, and little-endian u64 ID sidecar in caller-owned byte buffers. idmap_save_bytes computes CIDs for both the code payload and the ID sidecar; idmap_load_bytes verifies both before allocation. The byte APIs use explicit pointer+length arguments, for example idmap_save_bytes(&idx, bytes, cap) and idmap_load_bytes(&idx, &pa, bytes, len).

idmap_save_file and idmap_load_file provide the same .jvim contract over a filesystem path.

The proof gate is the AOT smoke harness:

Terminal window
cd janus
./scripts/zb test-vector

The broad test step also depends on this harness:

Terminal window
cd janus
./scripts/zb test

The harness lives at std/compute/vector_smoke.jan and covers metrics, normalization, b2/b4 packing, Lloyd-Max quantization, seeded rotation/unrotation, top-k ordering, exact full-precision search, recall@k milli scoring, positional TurboIndex add/search/remove behavior, and stable IdMapIndex add/search/remove/allowlist behavior. It also round-trips .jvi and .jvim headers through encode_vector_header / decode_vector_header, then saves and reloads positional .jvi and stable-ID .jvim payloads through both caller-owned byte buffers and files, and proves search still works. It also corrupts one .jvi code byte and one .jvim ID-sidecar byte and expects VectorStatus.cid_mismatch. It also checks retained-storage accounting before and after index_deinit / idmap_deinit so the no-leak/no-double-free contract is covered by the same proof gate. Backend coverage currently proves automatic cpu_auto dispatch/reporting: supported targets select avx512, avx2, or neon before falling back to scalar, and unavailable explicit backends report backend_unavailable. It also compares cpu_auto and target-available explicit SIMD results against an explicit scalar index, so accelerated scoring remains tied to the scalar oracle. Bridge compile-only checks cover the AVX512 and NEON target-feature builds where this host cannot execute those instructions. It also checks SearchReport fields for positional and stable-ID search, including backend, visited count, result count, and compression-ratio payload data.

Not shipped yet:

  • Native Janus simd[T; N] reimplementation, pending SPEC-040.
  • GPU/NPU/device vector backends.

Treat the current module as the stable scalar reference layer, not as a complete vector database.