Testing Guide

SpecSplit uses pytest with a clear separation between fast unit tests and heavier integration tests that require model downloads or GPU access.

Test Layout

tests/
├── conftest.py                     # Shared fixtures (configs, tmp dirs)
├── unit/                           # Fast tests — no models, no network
│   ├── test_serialization.py       #   Tensor ↔ list round-trips
│   ├── test_telemetry.py           #   Stopwatch + TelemetryLogger
│   ├── test_config.py              #   Pydantic config validation
│   ├── test_draft_engine.py        #   DraftEngine init + stub generation
│   ├── test_target_engine.py       #   Session caching, rollback, verification
│   ├── test_verification.py        #   Greedy + stochastic tree verification
│   └── test_pipeline.py            #   _get_longest_path, pipeline helpers
└── integration/                    # Requires transformers + torch
    ├── test_grpc_roundtrip.py      #   gRPC service binding smoke tests
    ├── test_exact_match.py         #   Speculative vs standard generation
    └── test_e2e.py                 #   Full gRPC end-to-end exact-match validation

Running Tests

Activate the project virtual environment before running tests (e.g. source .venv/bin/activate). Then:

All Tests

make test
# runs unit tests only (tests/unit/):
pytest tests/unit/ -v --tb=short

make test-all
# runs all tests including integration:
pytest tests/ -v --tb=short

Unit Tests Only (fast, no GPU)

pytest tests/unit/ -v

Integration Tests Only

pytest tests/integration/ -v -s

Integration tests download models on first run (~1 GB for Qwen2.5-0.5B). Subsequent runs use the HuggingFace cache.

Single Test File

pytest tests/unit/test_target_engine.py -v

Single Test by Name

pytest -k "test_rollback_crops_tensors" -v

Test Categories

Unit Tests (`tests/unit/`)

Goal: Validate business logic in isolation. No real models, no network, no GPU. These must run in < 5 seconds total.

File	What It Tests	Key Fixtures
`test_serialization.py`	`tensor_to_token_ids` / `token_ids_to_tensor` round-trips, `softmax_with_temperature`	—
`test_telemetry.py`	`Stopwatch` precision, `TelemetryLogger` span collection + JSON export	`tmp_path`
`test_config.py`	Pydantic defaults, env var override, field validation	Monkeypatch
`test_draft_engine.py`	`DraftEngine` init, stub tree generation, `TokenNode.to_dict()`	`draft_config`
`test_target_engine.py`	Session create/reuse/evict, `rollback_cache` tensor cropping, verify with sessions	`target_engine`, `fake_kv_cache`
`test_verification.py`	`verify_greedy_tree` and `verify_stochastic_tree` (paths, branching, shape validation)	—
`test_pipeline.py`	`_get_longest_path` (single chain, branching, tie-break)	—

Integration Tests (`tests/integration/`)

Goal: End-to-end correctness with real model inference. Marked with @pytest.mark.integration so they can be selectively skipped in CI.

File	What It Tests
`test_exact_match.py`	Loads Qwen2.5-0.5B as both draft and target. Asserts speculative decoding output is byte-identical to `model.generate()`. Tests multiple prompts, varying gamma (K=1,3,5,10), and edge cases. Uses mock gRPC stubs (no ports).
`test_grpc_roundtrip.py`	Smoke test for the gRPC service bindings (currently stubbed).
`test_e2e.py`	Full end-to-end gRPC validation. Spins up real Draft and Target gRPC servers on ephemeral ports, runs the orchestrator pipeline, and asserts output is byte-identical to `model.generate()`.

Writing New Tests

Conventions

File naming: test_<module_under_test>.py
Class naming: class Test<Feature>: — groups related assertions.
Fixtures over setup: Use conftest.py fixtures, not setUp/tearDown.
Docstrings on every test: One line describing the assertion.
Determinism: Use torch.manual_seed() and do_sample=False for reproducible model outputs.

Adding a Unit Test

# tests/unit/test_my_module.py

from specsplit.core.my_module import my_function

class TestMyFunction:
    def test_basic_case(self):
        """my_function should return 42 for input 'hello'."""
        assert my_function("hello") == 42

    def test_edge_case(self):
        """my_function should raise ValueError on empty input."""
        with pytest.raises(ValueError):
            my_function("")

Adding an Integration Test

# tests/integration/test_new_feature.py

import pytest

try:
    from transformers import AutoModelForCausalLM
    _SKIP = False
except ImportError:
    _SKIP = True

pytestmark = [
    pytest.mark.integration,
    pytest.mark.skipif(_SKIP, reason="transformers not installed"),
]

@pytest.fixture(scope="module")
def model():
    return AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B").eval()

class TestNewFeature:
    def test_something(self, model):
        """Feature X should produce Y when given Z."""
        ...

Shared Fixtures (`conftest.py`)

The root conftest.py provides pre-built config fixtures:

Fixture	Type	Description
`draft_config`	`DraftWorkerConfig`	CPU-based draft config for testing
`target_config`	`TargetWorkerConfig`	CPU-based target config
`tmp_path`	`Path`	pytest built-in temp directory

Markers

Marker	Purpose
`@pytest.mark.integration`	Requires model download / GPU
`@pytest.mark.slow`	Takes > 10 seconds

Skip integration tests in CI without GPU:

pytest -m "not integration"

Coverage

Generate a coverage report:

pytest --cov=specsplit --cov-report=html tests/
open htmlcov/index.html

CI Integration

The test suite is designed to run in two stages:

Fast gate (pytest tests/unit/ -x) — runs on every push, < 10s.
Full validation (pytest -v) — runs on PR merge or nightly, includes model download + integration tests.

The Makefile target make test runs the fast gate; use make test-all for the full suite.