Skip to content

Testing Guide

SpecSplit uses pytest with a clear separation between fast unit tests and heavier integration tests that require model downloads or GPU access.


Test Layout

tests/
├── conftest.py                     # Shared fixtures (configs, tmp dirs)
├── unit/                           # Fast tests — no models, no network
│   ├── test_serialization.py       #   Tensor ↔ list round-trips
│   ├── test_telemetry.py           #   Stopwatch + TelemetryLogger
│   ├── test_config.py              #   Pydantic config validation
│   ├── test_draft_engine.py        #   DraftEngine init + stub generation
│   ├── test_target_engine.py       #   Session caching, rollback, verification
│   ├── test_verification.py        #   Greedy + stochastic tree verification
│   └── test_pipeline.py            #   _get_longest_path, pipeline helpers
└── integration/                    # Requires transformers + torch
    ├── test_grpc_roundtrip.py      #   gRPC service binding smoke tests
    ├── test_exact_match.py         #   Speculative vs standard generation
    └── test_e2e.py                 #   Full gRPC end-to-end exact-match validation

Running Tests

Activate the project virtual environment before running tests (e.g. source .venv/bin/activate). Then:

All Tests

make test
# runs unit tests only (tests/unit/):
pytest tests/unit/ -v --tb=short

make test-all
# runs all tests including integration:
pytest tests/ -v --tb=short

Unit Tests Only (fast, no GPU)

pytest tests/unit/ -v

Integration Tests Only

pytest tests/integration/ -v -s

Integration tests download models on first run (~1 GB for Qwen2.5-0.5B). Subsequent runs use the HuggingFace cache.

Single Test File

pytest tests/unit/test_target_engine.py -v

Single Test by Name

pytest -k "test_rollback_crops_tensors" -v

Test Categories

Unit Tests (tests/unit/)

Goal: Validate business logic in isolation. No real models, no network, no GPU. These must run in < 5 seconds total.

File What It Tests Key Fixtures
test_serialization.py tensor_to_token_ids / token_ids_to_tensor round-trips, softmax_with_temperature
test_telemetry.py Stopwatch precision, TelemetryLogger span collection + JSON export tmp_path
test_config.py Pydantic defaults, env var override, field validation Monkeypatch
test_draft_engine.py DraftEngine init, stub tree generation, TokenNode.to_dict() draft_config
test_target_engine.py Session create/reuse/evict, rollback_cache tensor cropping, verify with sessions target_engine, fake_kv_cache
test_verification.py verify_greedy_tree and verify_stochastic_tree (paths, branching, shape validation)
test_pipeline.py _get_longest_path (single chain, branching, tie-break)

Integration Tests (tests/integration/)

Goal: End-to-end correctness with real model inference. Marked with @pytest.mark.integration so they can be selectively skipped in CI.

File What It Tests
test_exact_match.py Loads Qwen2.5-0.5B as both draft and target. Asserts speculative decoding output is byte-identical to model.generate(). Tests multiple prompts, varying gamma (K=1,3,5,10), and edge cases. Uses mock gRPC stubs (no ports).
test_grpc_roundtrip.py Smoke test for the gRPC service bindings (currently stubbed).
test_e2e.py Full end-to-end gRPC validation. Spins up real Draft and Target gRPC servers on ephemeral ports, runs the orchestrator pipeline, and asserts output is byte-identical to model.generate().

Writing New Tests

Conventions

  1. File naming: test_<module_under_test>.py
  2. Class naming: class Test<Feature>: — groups related assertions.
  3. Fixtures over setup: Use conftest.py fixtures, not setUp/tearDown.
  4. Docstrings on every test: One line describing the assertion.
  5. Determinism: Use torch.manual_seed() and do_sample=False for reproducible model outputs.

Adding a Unit Test

# tests/unit/test_my_module.py

from specsplit.core.my_module import my_function

class TestMyFunction:
    def test_basic_case(self):
        """my_function should return 42 for input 'hello'."""
        assert my_function("hello") == 42

    def test_edge_case(self):
        """my_function should raise ValueError on empty input."""
        with pytest.raises(ValueError):
            my_function("")

Adding an Integration Test

# tests/integration/test_new_feature.py

import pytest

try:
    from transformers import AutoModelForCausalLM
    _SKIP = False
except ImportError:
    _SKIP = True

pytestmark = [
    pytest.mark.integration,
    pytest.mark.skipif(_SKIP, reason="transformers not installed"),
]

@pytest.fixture(scope="module")
def model():
    return AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B").eval()

class TestNewFeature:
    def test_something(self, model):
        """Feature X should produce Y when given Z."""
        ...

Shared Fixtures (conftest.py)

The root conftest.py provides pre-built config fixtures:

Fixture Type Description
draft_config DraftWorkerConfig CPU-based draft config for testing
target_config TargetWorkerConfig CPU-based target config
tmp_path Path pytest built-in temp directory

Markers

Marker Purpose
@pytest.mark.integration Requires model download / GPU
@pytest.mark.slow Takes > 10 seconds

Skip integration tests in CI without GPU:

pytest -m "not integration"

Coverage

Generate a coverage report:

pytest --cov=specsplit --cov-report=html tests/
open htmlcov/index.html

CI Integration

The test suite is designed to run in two stages:

  1. Fast gate (pytest tests/unit/ -x) — runs on every push, < 10s.
  2. Full validation (pytest -v) — runs on PR merge or nightly, includes model download + integration tests.

The Makefile target make test runs the fast gate; use make test-all for the full suite.