Testing
This page covers the practical aspects of testing PROTEUS: running tests, writing tests, checking coverage, and working with CI.
For the conceptual framework behind the testing strategy (tier hierarchy, physics invariants, validation certification), see Test framework.
Quick start
Install with pip install -e ".[develop]", then:
pytest -m "unit and not skip" # Fast unit tests (~2 min)
pytest -m "smoke and not skip" # Smoke tests with real binaries
pytest --cov=src --cov-report=html # Generate coverage report
open htmlcov/index.html # View coverage in browser
Before committing:
pytest -m "unit and not skip"must passruff check src/ tests/ && ruff format src/ tests/must passbash tools/validate_test_structure.shmust pass
Test markers
Every test function carries a tier marker that controls when and where it runs:
| Marker | What it tests | Speed budget | CI surface |
|---|---|---|---|
@pytest.mark.unit |
Python logic, mocked physics | < 100 ms | Every PR |
@pytest.mark.smoke |
Real binaries, 1 timestep, low res | < 30 s | Every PR |
@pytest.mark.integration |
Multi-module coupling | Minutes | Nightly |
@pytest.mark.slow |
Full physics validation | Hours | Nightly |
@pytest.mark.skip |
Deliberately disabled | n/a | Never |
Every test file must have a module-level marker:
pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
Timeout ceilings: 30 s for unit, 60 s for smoke, 300 s for integration, 3600 s for slow.
Which marker to use
- Most tests:
unit. Mock external dependencies, test one function. - Testing real binaries:
smoke. SOCRATES, AGNI, or SPIDER actually running, 1 timestep. - Testing module coupling:
integration. Aragog + AGNI working together, multiple timesteps. - Full science validation:
slow. Multi-hour simulations comparing against published results.
Writing tests
File layout
Tests mirror the source tree:
src/proteus/utils/helper.py → tests/utils/test_helper.py
src/proteus/escape/wrapper.py → tests/escape/test_wrapper.py
Validate with bash tools/validate_test_structure.sh.
Basic test structure
import pytest
from proteus.utils.helper import my_function
pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
def test_my_function_returns_expected_value():
"""Verify my_function computes the correct result for standard input."""
result = my_function(input_value=10)
assert result == pytest.approx(42.0, rel=1e-5)
# Discrimination: a common off-by-one bug would give 41.0,
# which is outside the tolerance
assert result > 41.5
Requirements for every test
- Docstring: state the physical scenario or contract being verified
- At least two assertions: the second discriminates against the most plausible wrong answer
- At least one edge case: boundary value, empty input, or extreme parameter
- No bare float
==: usepytest.approx(val, rel=...)ornp.testing.assert_allclose
Mocking
Unit tests mock external calls (SOCRATES, AGNI, SPIDER, file I/O):
from unittest.mock import patch, MagicMock
@pytest.mark.unit
def test_run_atmosphere_dispatches_to_agni():
"""Verify the atmosphere wrapper calls AGNI when module='agni'."""
with patch('proteus.atmos_clim.wrapper.run_agni') as mock_agni:
mock_agni.return_value = None
run_atmosphere(config_with_agni, hf_row)
mock_agni.assert_called_once()
Mock at the narrowest scope (a specific function, not a whole module). Mocked physics functions must return physically plausible values.
Fixtures
Shared fixtures live in tests/conftest.py:
EarthLikeParams,UltraHotSuperEarthParams,IntermediateSuperEarthParams: pre-configured parameter setsconfig_earth,config_minimal,config_dummy: paths to test configstmp_path: pytest built-in for temporary directories
Optional dependencies
Tests importing optional packages must call pytest.importorskip:
pytest.importorskip('atmodeller')
pytest.importorskip('vulcan')
This prevents collection failures on CI runners without the optional package.
Coverage
Thresholds
| Gate | Tests | Target | Enforced |
|---|---|---|---|
| Fast (every PR) | unit + smoke | Ratcheting toward 90% | PR checks |
| Full (nightly) | unit + smoke + integration + slow | 90% | Nightly CI |
| Diff-cover (every PR) | Changed lines only | 80% | PR checks |
Thresholds auto-ratchet upward (never decrease) and are capped at 90%.
Checking coverage locally
pytest --cov=src --cov-report=html
open htmlcov/index.html
Module-level analysis:
bash tools/coverage_analysis.sh
Test quality validation
python tools/check_test_quality.py --check
This AST-based linter flags:
- Single-assert test functions
- Weak standalone assertions (
assert result is not None) - Missing function-level docstrings
- Float
==comparisons - Missing module-level tier markers
CI/CD pipeline
Pull request checks
When you open a PR, CI runs:
- Structure validation:
tests/mirrorssrc/proteus/ - Unit tests (Linux + macOS):
pytest -m "unit and not skip" - Smoke tests:
pytest -m "smoke and not skip" - Diff-cover: 80% coverage on changed lines
- Lint:
ruff checkandruff format - Editable install: verifies the package installs correctly
Runtime: ~5-10 minutes.
Nightly validation
The nightly workflow runs all tiers:
- Unit + smoke on Linux and macOS
- Integration on Linux and macOS
- Slow across multiple shards (aragog, zalmoxis-coupled, agni, janus-inference, etc.)
- Coverage aggregate: combines all tiers and checks the 90% gate
Runtime: ~2-3 hours.
Pre-commit checklist
Before every commit:
pytest -m "unit and not skip" # Tests pass
ruff check --fix src/ tests/ # Lint
ruff format src/ tests/ # Format
bash tools/validate_test_structure.sh # Structure
The pre-commit hook runs ruff check and ruff format automatically.