Protocol: `spec_decoding.proto`

The entire draft→verify loop is defined by the gRPC services and message schema in specsplit/proto/spec_decoding.proto.

Where it lives

It lets the Draft Worker and Target Worker exchange a compact representation of speculation candidates:

the prompt/context is represented as repeated int32 token IDs
the speculative candidates are represented as a tree of TokenNodes
the Target Worker can optionally reuse per-session KV cache state via session_id

This keeps the network payload small and makes verification latency dominated by the Target model forward pass (rather than serialization cost).

Single node in the speculative tree:

token_id: vocabulary index of the candidate token
log_prob: log-probability assigned by the draft model
children: child candidate nodes (branching)
top_k_token_ids / top_k_probs: optional Top-K distribution data used for full-vocabulary residual computations.

VerifyRequest
draft_tree: draft candidates to verify
session_id: KV cache reuse key (empty means stateless verification)
temperature: 0 for greedy verification; >0 for stochastic verification
expected_prefix_length: orchestrator’s expected accepted prefix length
VerifyResponse
accepted_token_ids: longest accepted prefix from the tree
correction_token_id + has_correction: correction token when draft rejection occurs
cache_hit: whether session KV cache was reused
telemetry: server-side timing metadata

The CI workflow job proto-check compiles spec_decoding.proto and verifies the generated stub files exist.
Unit tests validate key building blocks around token-tree transformations and verification math (see tests/unit/).