Testing Guide
SpecSplit uses pytest with a clear separation between fast unit tests and heavier integration tests that require model downloads or GPU access.
Test Layout
tests/
├── conftest.py # Shared fixtures (configs, tmp dirs)
├── unit/ # Fast tests — no models, no network
│ ├── test_serialization.py # Tensor ↔ list round-trips
│ ├── test_telemetry.py # Stopwatch + TelemetryLogger
│ ├── test_config.py # Pydantic config validation
│ ├── test_draft_engine.py # DraftEngine init + stub generation
│ ├── test_target_engine.py # Session caching, rollback, verification
│ ├── test_verification.py # Greedy + stochastic tree verification
│ └── test_pipeline.py # _get_longest_path, pipeline helpers
└── integration/ # Requires transformers + torch
├── test_grpc_roundtrip.py # gRPC service binding smoke tests
├── test_exact_match.py # Speculative vs standard generation
└── test_e2e.py # Full gRPC end-to-end exact-match validation
Running Tests
Activate the project virtual environment before running tests (e.g.
source .venv/bin/activate). Then:
All Tests
make test
# runs unit tests only (tests/unit/):
pytest tests/unit/ -v --tb=short
make test-all
# runs all tests including integration:
pytest tests/ -v --tb=short
Unit Tests Only (fast, no GPU)
pytest tests/unit/ -v
Integration Tests Only
pytest tests/integration/ -v -s
Integration tests download models on first run (~1 GB for Qwen2.5-0.5B). Subsequent runs use the HuggingFace cache.
Single Test File
pytest tests/unit/test_target_engine.py -v
Single Test by Name
pytest -k "test_rollback_crops_tensors" -v
Test Categories
Unit Tests (tests/unit/)
Goal: Validate business logic in isolation. No real models, no network, no GPU. These must run in < 5 seconds total.
| File | What It Tests | Key Fixtures |
|---|---|---|
test_serialization.py |
tensor_to_token_ids / token_ids_to_tensor round-trips, softmax_with_temperature |
— |
test_telemetry.py |
Stopwatch precision, TelemetryLogger span collection + JSON export |
tmp_path |
test_config.py |
Pydantic defaults, env var override, field validation | Monkeypatch |
test_draft_engine.py |
DraftEngine init, stub tree generation, TokenNode.to_dict() |
draft_config |
test_target_engine.py |
Session create/reuse/evict, rollback_cache tensor cropping, verify with sessions |
target_engine, fake_kv_cache |
test_verification.py |
verify_greedy_tree and verify_stochastic_tree (paths, branching, shape validation) |
— |
test_pipeline.py |
_get_longest_path (single chain, branching, tie-break) |
— |
Integration Tests (tests/integration/)
Goal: End-to-end correctness with real model inference. Marked with
@pytest.mark.integration so they can be selectively skipped in CI.
| File | What It Tests |
|---|---|
test_exact_match.py |
Loads Qwen2.5-0.5B as both draft and target. Asserts speculative decoding output is byte-identical to model.generate(). Tests multiple prompts, varying gamma (K=1,3,5,10), and edge cases. Uses mock gRPC stubs (no ports). |
test_grpc_roundtrip.py |
Smoke test for the gRPC service bindings (currently stubbed). |
test_e2e.py |
Full end-to-end gRPC validation. Spins up real Draft and Target gRPC servers on ephemeral ports, runs the orchestrator pipeline, and asserts output is byte-identical to model.generate(). |
Writing New Tests
Conventions
- File naming:
test_<module_under_test>.py - Class naming:
class Test<Feature>:— groups related assertions. - Fixtures over setup: Use
conftest.pyfixtures, notsetUp/tearDown. - Docstrings on every test: One line describing the assertion.
- Determinism: Use
torch.manual_seed()anddo_sample=Falsefor reproducible model outputs.
Adding a Unit Test
# tests/unit/test_my_module.py
from specsplit.core.my_module import my_function
class TestMyFunction:
def test_basic_case(self):
"""my_function should return 42 for input 'hello'."""
assert my_function("hello") == 42
def test_edge_case(self):
"""my_function should raise ValueError on empty input."""
with pytest.raises(ValueError):
my_function("")
Adding an Integration Test
# tests/integration/test_new_feature.py
import pytest
try:
from transformers import AutoModelForCausalLM
_SKIP = False
except ImportError:
_SKIP = True
pytestmark = [
pytest.mark.integration,
pytest.mark.skipif(_SKIP, reason="transformers not installed"),
]
@pytest.fixture(scope="module")
def model():
return AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B").eval()
class TestNewFeature:
def test_something(self, model):
"""Feature X should produce Y when given Z."""
...
Shared Fixtures (conftest.py)
The root conftest.py provides pre-built config fixtures:
| Fixture | Type | Description |
|---|---|---|
draft_config |
DraftWorkerConfig |
CPU-based draft config for testing |
target_config |
TargetWorkerConfig |
CPU-based target config |
tmp_path |
Path |
pytest built-in temp directory |
Markers
| Marker | Purpose |
|---|---|
@pytest.mark.integration |
Requires model download / GPU |
@pytest.mark.slow |
Takes > 10 seconds |
Skip integration tests in CI without GPU:
pytest -m "not integration"
Coverage
Generate a coverage report:
pytest --cov=specsplit --cov-report=html tests/
open htmlcov/index.html
CI Integration
The test suite is designed to run in two stages:
- Fast gate (
pytest tests/unit/ -x) — runs on every push, < 10s. - Full validation (
pytest -v) — runs on PR merge or nightly, includes model download + integration tests.
The Makefile target make test runs the fast gate; use make test-all
for the full suite.