d4da.com / c · forge
Gemma 4 31B selected-logit splice, lane by lane
Recorded Doppler reference, live in-browser CSL tail, hash-bound receipt. The hardware lane shows what's publishable today.
Doppler WebGPU reference
Real Gemma 4 31B af16 prefill, last layer post-FFN state, tied lm-head logits. Captured from the frozen reference fixture; sha256-bound.
- fixture
- …
- frozen at
- …
- top-1 token
- …
- top-1 logit
- …
- top-2 logit
- …
- top-1 vs top-2
- …
- handoff tensor
- …
- recorded sha256
- …
- browser sha256
- computing…
- verification
- computing…
Activation + weight chunks, byte-bound
Per-chunk activation (post-RMSNorm) and lm-head weight slice. Browser recomputes sha256 over each shipped .npy and verifies against the recorded splice receipt.
- chunks
- 6 chunks · 5376 hidden / 32 PE width
- per-chunk dtype
- activation f16 · weight f16 · output f32
- verification
- computing…
Live selected lm_head GEMV in your browser
Browser runs the chunked f16 dot product over the same activation + weight bytes the CSL run consumed (post-RMSNorm activations from the splice path). The final_norm_f16 step ran in the recorded splice; its CSL source is shown below for visibility, but only the lm_head_prefill GEMV executes live here.
- live kernel
- lm_head_prefill · selected token only
- recorded chain
- final_norm_f16 → lm_head_prefill → softcap (full chain in receipt)
- backend
- …
- per-chunk math
- 1024 × f16·f16 → f32 accumulate
Authored CSL source (visible kernel lowering)
…
…
…
…
Splice receipt, schema-bound
Comparison mode: argmax_decision_bound. Top-token decision is preserved; strict logit tolerance is recorded separately.
- verdict
- …
- argmax stable
- …
- CSL argmax
- …
- reference argmax
- …
- max logit |Δ|
- …
- strict tolerance
- …
- decision margin
- …
memcpy_d2h_start; CS-3 receipt pending
…