std.compute.vector
std.compute.vector
Section titled “std.compute.vector”std.compute.vector is the vector-kernel layer for the :compute profile. It
is the scalar/reference surface for Janus vector search: pure Janus kernels,
deterministic behavior, caller-owned query/result buffers, and explicit retained
storage.
The positional TurboIndex surface is available for in-memory search over
retained packed codes. IdMapIndex wraps it with stable external u64 IDs and
allowlist-before-heap search. Header encode/decode for .jvi and .jvim is
also available. Positional .jvi indexes can save/load packed-code payloads
through caller-owned byte buffers or files, and stable-ID .jvim indexes can
save/load packed-code payloads plus the external-ID sidecar through caller-owned
byte buffers or files. Save computes BLAKE3 payload CIDs; load verifies them and
rejects tampered payloads with
VectorStatus.cid_mismatch. Fixed Lloyd-Max b2/b4 tables are used for scalar
quantization. Target-feature-gated neon, avx2, and avx512 grafted scoring
paths are available on matching build targets.
TurboQuant and TurboVec are first-class façade types now:
TurboQuant makes quantization policy explicit (b2/b4), while TurboVec
provides a stable-ID ANN API over the same IdMapIndex core behavior.
Import
Section titled “Import”use std.compute.vectorThe aggregate std.compute module also re-exports vector.
Current Scope
Section titled “Current Scope”The shipped Phase 1 API covers:
| Area | Functions |
|---|---|
| Scalar metrics | dot, norm_f32, l2, cosine, normalize_f32 |
| Rotation | rotate_f32, unrotate_f32 |
| Bit packing | pack_b4, unpack_b4, pack_b2, unpack_b2 |
| Quantization | quantize_b4, dequantize_b4, quantize_vec_b4, dequantize_vec_b4, quantize_b2, dequantize_b2, quantize_vec_b2, dequantize_vec_b2, TurboQuant, TurboQuantConfig, turbo_quant_new, turbo_quant_quantize, turbo_quant_dequantize |
| Top-k heap | topk_offer, topk_finalize |
| Benchmark oracle | exact_search, recall_at_k_milli |
| Positional index | index_init, index_add, index_search, index_search_report, index_remove, index_deinit, index_backend |
| Stable-ID index | idmap_init, idmap_deinit, idmap_add_with_ids, idmap_remove, idmap_search, idmap_search_report, idmap_search_allowlist |
| TurboVec façade | TurboVec, turbo_vec_init, turbo_vec_deinit, turbo_vec_as_idmap, turbo_vec_add, turbo_vec_remove, turbo_vec_search, turbo_vec_search_allowlist, turbo_vec_search_report, turbo_vec_persisted_bytes, turbo_vec_save_bytes, turbo_vec_load_bytes, turbo_vec_save_file, turbo_vec_load_file |
| Search reporting | SearchReport |
| Persistence | VectorHeader, VectorFileKind, encode_vector_header, decode_vector_header, index_persisted_bytes, index_save_bytes, index_load_bytes, index_save_file, index_load_file, idmap_persisted_bytes, idmap_save_bytes, idmap_load_bytes, idmap_save_file, idmap_load_file |
Metrics, rotation, packing, quantization, and top-k operate on caller-owned
buffers. TurboIndex retains packed-code storage allocated during index_init
through the explicit parent allocator in IndexConfig. IdMapIndex adds one
retained u64 sidecar for stable external IDs. Search calls do not allocate and
write results into caller-owned buffers.
Metrics
Section titled “Metrics”use std.compute.vector
func main() -> i32 do let a = [_]f32{1.0, 2.0, 3.0} let b = [_]f32{4.0, 5.0, 6.0}
if vector.dot(a, b, 3) != 32.0 do return 1 end
let v = [_]f32{3.0, 4.0} if vector.norm_f32(v, 2) != 5.0 do return 2 end
let mut out = [_]f32{0.0, 0.0} vector.normalize_f32(v, 2, out)
return 0endnorm_f32 uses a local Babylonian square root implementation, not a C libm
graft. That keeps the reference path sovereign and deterministic.
Rotation
Section titled “Rotation”let mut embedding = [_]f32{0.5, -0.25, 0.75, 1.5}
vector.rotate_f32(embedding, 4, 12345, 4)vector.unrotate_f32(embedding, 4, 12345, 4)Rotation is seeded and reproducible. The implementation applies deterministic
Householder reflections derived from the seed and round number. The transform is
norm-preserving and storage-free: unrotate_f32 regenerates the same reflection
sequence in reverse.
Bit Packing
Section titled “Bit Packing”let codes = [_]u8{1, 2, 15, 0, 7, 8}var packed: [3]u8 = undefinedvar unpacked: [6]u8 = undefined
if vector.pack_b4(codes, 6, packed) != 3 do return 1 endvector.unpack_b4(packed, 6, unpacked)b4 packs two 4-bit codes per byte. b2 packs four 2-bit codes per byte. Tail
bytes are zero-padded, and unpacking reads exactly the requested code count.
Quantization
Section titled “Quantization”let qv = [_]f32{-0.5, 0.0, 0.5, 0.95}var codes: [4]u8 = undefinedlet mut decoded = [_]f32{0.0, 0.0, 0.0, 0.0}
vector.quantize_vec_b4(qv, 4, codes)vector.dequantize_vec_b4(codes, 4, decoded)The current quantizer uses fixed Lloyd-Max scalar tables for b4 and b2.
quantize_b4 maps a scalar to one of 16 reconstruction levels; quantize_b2
maps to one of 4. The tables are deterministic compile-time constants, so the
scalar reference stays training-free and reproducible.
TurboQuant exposes explicit quantization policy under a typed façade:
use std.compute.vector
var qcfg = vector.TurboQuantConfig { kind: vector.TurboQuantKind.b4 }let q = vector.turbo_quant_new(&qcfg)let vector_data = [_]f32{-0.5, 0.0, 0.5, 0.95}var codes: [4]u8 = undefinedvar decoded: [4]f32 = undefined
vector.turbo_quant_quantize(&q, vector_data, 4, codes)vector.turbo_quant_dequantize(&q, codes, 4, decoded)var scores: [3]f32 = undefinedvar ids: [3]u64 = undefinedvar len: u32 = 0
len = vector.topk_offer(scores, ids, len, 3, 0.5, 10)len = vector.topk_offer(scores, ids, len, 3, 0.875, 20)len = vector.topk_offer(scores, ids, len, 3, 0.125, 30)len = vector.topk_offer(scores, ids, len, 3, 0.75, 40)
vector.topk_finalize(scores, ids, len)The heap keeps the largest score values. Callers using L2 distance should pass
-distance as the score key so “larger is better” remains true internally.
After topk_finalize, the arrays are sorted in descending score order and are
no longer a heap.
Benchmark Oracle
Section titled “Benchmark Oracle”exact_search runs a full-precision top-k scan over caller-owned raw
[count][dim]f32 storage. It is not an index; it is the correctness oracle for
recall measurements and small benchmark harnesses.
let vectors = [_]f32{ 1.0, 0.0, 0.0, 1.0, 0.75, 0.75,}let query = [_]f32{1.0, 0.0}var exact_scores: [2]f32 = undefinedvar exact_ids: [2]u64 = undefined
let exact_found = vector.exact_search( vectors, 3, 2, vector.Metric.cosine, query, 2, exact_scores, exact_ids,)recall_at_k_milli compares approximate result IDs against exact baseline IDs
as sets within the first k entries and returns milli-units: 1000 means
recall 1.0, 500 means recall 0.5. The helper is allocation-free and
order-insensitive; ranking quality beyond membership remains a benchmark-harness
concern.
Positional TurboIndex
Section titled “Positional TurboIndex”TurboIndex stores packed quantized vectors and returns positional slot IDs.
Slots are not stable across removal: index_remove uses swap_remove, so the
last vector can move into the removed position.
use std.compute.vectoruse std.alloc.page_allocator
func main() -> i32 do var pa = page_allocator.init() var cfg = vector.IndexConfig { dim: 2, bit_width: vector.BitWidth.b4, metric: vector.Metric.cosine, backend: vector.Backend.cpu_auto, seed: 99, capacity: 4, rounds: 0, allocator: &pa, }
var idx: vector.TurboIndex = undefined if vector.index_init(&idx, &cfg) != vector.VectorStatus.ok do return 1 end
let vectors = [_]f32{ 1.0, 0.0, 0.0, 1.0, 0.75, 0.75, } if vector.index_add(&idx, vectors, 3) != vector.VectorStatus.ok do vector.index_deinit(&idx) return 2 end
let query = [_]f32{1.0, 0.0} var scores: [2]f32 = undefined var ids: [2]u64 = undefined
let found = vector.index_search(&idx, query, 2, scores, ids)
vector.index_deinit(&idx) return 0endIndexConfig.capacity preallocates retained packed-code storage using
IndexConfig.allocator. rounds controls the seeded Householder rotation
count; 0 is valid for deterministic test cases that isolate index semantics.
The smoke harness asserts that retained-storage accounting returns to zero after
index_deinit, and that a second index_deinit call remains inert.
index_backend reports the selected backend. cpu_auto selects the best
compiled target-supported grafted scorer in this order: avx512, avx2,
neon, then scalar. Explicit neon, avx2, and avx512 indexes use grafted
Zig vector-lane scoring kernels for dot/L2 candidate scoring when the compiled
target advertises the matching feature; otherwise index_init returns
VectorStatus.backend_unavailable. The smoke harness builds matching
cpu_auto, explicit scalar, and target-available explicit SIMD indexes and
asserts result count, IDs, and scores match, keeping accelerated backends tied
to the scalar oracle.
SearchReport is the stable reporting payload for benchmark harnesses.
index_search_report and idmap_search_report run the same search as their
plain counterparts, then fill backend, metric, bit width, dimension, vector
count, visited count, requested k, found count, code payload bytes, raw
[dim]f32 payload bytes, and compression_ratio_milli. Pair these reports with
exact_search and recall_at_k_milli to produce recall@k numbers against the
scalar full-precision oracle. Latency remains caller-measured so the stdlib does
not fabricate a clock abstraction.
Stable-ID IdMapIndex
Section titled “Stable-ID IdMapIndex”IdMapIndex keeps external u64 IDs stable across deletion. Internally it uses
the same positional packed-code store, then repairs the ID sidecar after
swap_remove.
use std.compute.vectoruse std.alloc.page_allocator
func main() -> i32 do var pa = page_allocator.init() var cfg = vector.IndexConfig { dim: 2, bit_width: vector.BitWidth.b4, metric: vector.Metric.cosine, backend: vector.Backend.scalar, seed: 99, capacity: 4, rounds: 0, allocator: &pa, }
var idx: vector.IdMapIndex = undefined if vector.idmap_init(&idx, &cfg) != vector.VectorStatus.ok do return 1 end
let vectors = [_]f32{ 1.0, 0.0, 0.0, 1.0, 0.75, 0.75, } let external_ids = [_]u64{101, 202, 303} if vector.idmap_add_with_ids(&idx, vectors, external_ids, 3) != vector.VectorStatus.ok do vector.idmap_deinit(&idx) return 2 end
let query = [_]f32{0.75, 0.75} let allow = [_]u64{101, 303} var scores: [2]f32 = undefined var ids: [2]u64 = undefined
let found = vector.idmap_search_allowlist(&idx, query, 2, allow, 2, scores, ids)
vector.idmap_deinit(&idx) return 0endidmap_search and idmap_search_allowlist return external IDs, never
positional slots. idmap_search_allowlist filters candidates before they are
offered to the top-k heap, so a disallowed candidate cannot evict an allowed
result. idmap_add_with_ids rejects duplicate external IDs with
VectorStatus.duplicate_id; stable IDs are unique, not a multimap.
IdMapIndex stores the ID sidecar beside the inner positional index. Its
lifecycle follows the same rule: initialize with idmap_init, release with
idmap_deinit. The vector smoke harness verifies that both the inner index and
the sidecar clear retained-storage accounting after idmap_deinit, and that a
second idmap_deinit call does not free again.
TurboVec façade
Section titled “TurboVec façade”TurboVec is a stable-ID first-class wrapper over IdMapIndex for ANN-style
storage and query:
use std.compute.vectoruse std.alloc.page_allocator
var pa = page_allocator.init()var cfg = vector.IndexConfig { dim: 2, bit_width: vector.BitWidth.b4, metric: vector.Metric.cosine, backend: vector.Backend.scalar, seed: 99, capacity: 4, rounds: 0, allocator: &pa,}
var tv: vector.TurboVec = undefinedif vector.turbo_vec_init(&tv, &cfg) != vector.VectorStatus.ok do return 1end
let vectors = [_]f32{ 1.0, 0.0, 0.0, 1.0, 0.75, 0.75,}let ids = [_]u64{101, 202, 303}vector.turbo_vec_add(&tv, vectors, ids, 3)
let query = [_]f32{0.75, 0.75}var scores: [2]f32 = undefinedvar result_ids: [2]u64 = undefinedlet found = vector.turbo_vec_search(&tv, query, 2, scores, result_ids)
vector.turbo_vec_remove(&tv, 101)vector.turbo_vec_deinit(&tv)turbo_vec_search_allowlist preserves the existing early-filter behavior and
turbo_vec_as_idmap exposes the underlying IdMapIndex for existing lower-level
interop when needed.
Persistence Headers
Section titled “Persistence Headers”.jvi is the positional vector-index format. .jvim is the stable-ID vector
index map format; it carries the same base header plus idmap_payload_cid for
the external-ID sidecar payload.
var header = vector.VectorHeader { kind: vector.VectorFileKind.jvim, version: vector.VECTOR_FORMAT_VERSION, dim: 1536, bit_width: vector.BitWidth.b4, metric: vector.Metric.cosine, backend_hint: vector.Backend.scalar, vector_count: 42, rotation_seed: 0x0123456789ABCDEF, rotation_rounds: 6, quantizer_kind: vector.VECTOR_QUANTIZER_TURBOQUANT, code_payload_cid: code_cid, norms_payload_cid: norms_cid, idmap_payload_cid: idmap_cid,}
var bytes: [144]u8 = undefinedlet status = vector.encode_vector_header_raw(bytes, 144, &header)VECTOR_HEADER_BYTES is 144. Multi-byte scalars are little-endian. The header
includes rotation_rounds so loaded indexes reproduce the original rotation.
For positional indexes, index_persisted_bytes, index_save_bytes, and
index_load_bytes save and reload the .jvi header plus packed-code payload in
caller-owned byte buffers. index_save_bytes computes the packed-code payload
CID from the serialized bytes; index_load_bytes recomputes and verifies it
before allocating the reopened index. The byte APIs use explicit pointer+length
arguments, for example index_save_bytes(&idx, bytes, cap) and
index_load_bytes(&idx, &pa, bytes, len).
index_save_file and index_load_file provide the same .jvi contract over a
filesystem path. File I/O failures return VectorStatus.io_error; malformed or
wrong-kind contents still return format_mismatch, and CID mismatches still
return cid_mismatch.
For stable-ID indexes, idmap_persisted_bytes, idmap_save_bytes, and
idmap_load_bytes save and reload the .jvim header, packed-code payload, and
little-endian u64 ID sidecar in caller-owned byte buffers. idmap_save_bytes
computes CIDs for both the code payload and the ID sidecar; idmap_load_bytes
verifies both before allocation. The byte APIs use explicit pointer+length
arguments, for example idmap_save_bytes(&idx, bytes, cap) and
idmap_load_bytes(&idx, &pa, bytes, len).
idmap_save_file and idmap_load_file provide the same .jvim contract over a
filesystem path.
Verification
Section titled “Verification”The proof gate is the AOT smoke harness:
cd janus./scripts/zb test-vectorThe broad test step also depends on this harness:
cd janus./scripts/zb testThe harness lives at std/compute/vector_smoke.jan and covers metrics,
normalization, b2/b4 packing, Lloyd-Max quantization, seeded rotation/unrotation,
top-k ordering, exact full-precision search, recall@k milli scoring, positional
TurboIndex add/search/remove behavior, and stable IdMapIndex
add/search/remove/allowlist behavior. It also round-trips .jvi and .jvim
headers through encode_vector_header / decode_vector_header, then saves and
reloads positional .jvi and stable-ID .jvim payloads through both
caller-owned byte buffers and files, and proves search still works. It also
corrupts one .jvi code byte and one
.jvim ID-sidecar byte and expects VectorStatus.cid_mismatch. It also checks
retained-storage accounting before and after index_deinit / idmap_deinit so
the no-leak/no-double-free contract is covered by the same proof gate. Backend
coverage currently proves automatic cpu_auto dispatch/reporting: supported
targets select avx512, avx2, or neon before falling back to scalar, and
unavailable explicit backends report backend_unavailable. It also compares
cpu_auto and target-available explicit SIMD results against an explicit
scalar index, so accelerated scoring remains tied to the scalar oracle.
Bridge compile-only checks cover the AVX512 and NEON target-feature builds where
this host cannot execute those instructions. It also checks
SearchReport fields for positional and stable-ID search, including backend,
visited count, result count, and compression-ratio payload data.
Limits
Section titled “Limits”Not shipped yet:
- Native Janus
simd[T; N]reimplementation, pending SPEC-040. - GPU/NPU/device vector backends.
Treat the current module as the stable scalar reference layer, not as a complete vector database.