Skip to content

LSM-backed STLStore

std.stl.lsm_store is the persistent implementation of the Sovereign Transparency Ledger store. It wraps a caller-owned std.db.lsm.GrainStoreBytes and stores each event in two keyspaces:

  • primary key: [0x00] ++ EventId.bytes -> canonical_event ++ u64_be(rank)
  • rank sidecar: [0x01] ++ u64_be(rank) -> EventId.bytes

The primary keyspace preserves content-addressed lookup by event id. The sidecar keyspace gives deterministic append-order lookup for auditors, witnesses, replay validators, and proof generators.

use std.db.lsm
use std.stl.lsm_store
use std.stl.store
var gs = lsm.gs_open_bytes("/tmp/stl.wal", 0x6503)
var store = lsm_store.make_store(&gs)
// append / append_and_derive
// get_by_id
// get_by_insertion_rank
// kind_by_insertion_rank
// iter_in_append_order / iter_next / iter_done / iter_err
// flush / flush_with_manifest
// make_store_recovered / manifest_status
// rebuild_rank_sidecar
_ = lsm.gs_close_bytes(&gs)

The LSMStore borrows the GrainStoreBytes; callers still own open, path selection, and close. For manifest-backed stores, use flush_with_manifest and make_store_recovered so the STL facade applies the save/load policy around the borrowed substrate.

make_store opens in one of three modes:

  • MODE_HEALTHY: primary entries and rank sidecars agree.
  • MODE_DEGRADED: the primary ledger is still readable by id, but positional APIs refuse until the sidecar is repaired.
  • MODE_RECOVERY_DEGRADED: a present L0 manifest was refused during recovered open.

Degraded sidecar mode is triggered by primary/sidecar count divergence or deterministic probe failure. Recovery-degraded mode is triggered by make_store_recovered when the manifest exists but fails validation or attachment. get_by_id remains available for primary entries that are still reachable. get_by_insertion_rank and iterators require MODE_HEALTHY.

rebuild_rank_sidecar(&store) scans primary entries, rewrites the rank sidecar, updates event_count, and flips the store back to MODE_HEALTHY on success.

Existing v0.2 entries keep their embedded rank suffix. Legacy primary entries without a trusted suffix are assigned tail ranks after the highest explicit rank. The repair writes through the GrainStoreBytes WAL, so a clean close/open replays the rebuilt sidecar.

gs_put_bytes_owned copies keys and values into a GrainStore-owned pool so per-call stack buffers can safely be appended. gs_flush_bytes drains the MemTable into an SSTable and resets that pool. The pool now bounds one in-memory batch, not the lifetime of the store.

Callers should flush periodically:

if lsm_store.flush(&store, "/tmp/stl-l0.sst") != store.STORE_OK do
return 1
end

GrainStoreBytes can persist its attached L0 SSTable list as a level manifest. This is the recovery path for stores that have flushed data out of the WAL and into SSTables. The STL facade exposes the normal policy as flush_with_manifest:

if lsm_store.flush_with_manifest(
&store,
"/var/lib/janus/stl-l0.sst",
"/var/lib/janus/stl.manifest",
"/var/lib/janus/stl.manifest.tmp",
"/var/lib/janus",
) != store.STORE_OK do
return 1
end

flush_with_manifest flushes and attaches the SSTable first, then writes a complete manifest to the temp path, fsyncs it, closes it, renames it over the final manifest, and fsyncs the directory. Each L0 entry records:

  • level number, currently 0
  • slot order, oldest to newest
  • SSTable path
  • SSTable byte length
  • SSTable footer entry_count
  • SSTable image fingerprint

Reopen through make_store_recovered after opening the WAL:

var reopened = lsm.gs_open_bytes("/var/lib/janus/stl.wal", 0x6503)
var recovered = lsm_store.make_store_recovered(&reopened, "/var/lib/janus/stl.manifest")
if lsm_store.manifest_status(&recovered) != store.STORE_OK do
_ = lsm.gs_close_bytes(&reopened)
return 2
end

Recovered open validates the manifest magic, version, body length, CRC, slot order, table file size, SSTable footer entry count, and SSTable image fingerprint before running the normal sidecar probe. Torn manifests, missing or truncated SSTables, and stale same-shape SSTables are refused. Attach-time failures reset the in-memory L0 count to zero and mark the store MODE_RECOVERY_DEGRADED so callers do not observe a partially loaded level list as healthy.

flush_with_manifest is the preferred STL path when the L0 list must represent all flushed writes. The manifest records attached SSTables, not the current MemTable. The lower-level lsm.gs_save_l0_manifest_bytes and lsm.gs_load_l0_manifest_bytes remain available for substrate tests and custom policies.

std.cluster.tombstones is the adapter from local actor supervision tombstones into STL events. It encodes a 64-byte inline effect payload with:

  • magic bytes AT
  • payload version
  • stable stop-reason code
  • tombstone sequence
  • child slot
  • code version, input digest, replay token, and state epoch
  • attempt count
  • timestamp seconds

The adapter exposes ActorTombstone, zero, make_event, append_lsm, and small event readers such as is_tombstone_event, event_reason, event_sequence, event_child, and event_attempt_count.

Sink callbacks normally receive the runtime record through std.cluster.local.tombstone_* accessors, copy the fields into ActorTombstone, and call:

if tombstones.append_lsm(&store, &t) != store.STORE_OK do
return 0
end

The LSM store still follows the same borrowed-handle rule. A callback may construct a short-lived LSMStore over the caller-owned GrainStoreBytes; the durable primary and rank sidecar entries live in the shared LSM substrate, not in the wrapper value.

  • Single writer per store.
  • Caller-owned GrainStoreBytes lifecycle.
  • Automatic discovery of manifest paths beyond the explicit recovered-open helper.
  • No per-append fsync in the facade; call substrate sync/flush according to the durability policy of the consumer.
  • Commit proof production is owned by future SPEC-088 integration.