Test Categorization and CI/CD
What This Document Is For
New to PROTEUS testing? This document explains how we organize tests into categories and how CI (Continuous Integration) automatically runs them when you submit code.
Key concept: Tests are labeled with markers that tell pytest what kind of test they are. Different markers run at different timesβfast tests run on every pull request, slow tests run overnight.
For writing tests, see Test Building. For coverage analysis and troubleshooting, see Test Infrastructure.
Test Categories (Markers)
Add one of these markers above each test function:
| Marker | What It Tests | Speed | When It Runs |
|---|---|---|---|
@pytest.mark.unit |
Python logic with mocked physics | <100 ms | Every PR |
@pytest.mark.smoke |
Real binaries, 1 timestep | <30 s | Every PR |
@pytest.mark.integration |
Multiple modules working together | Minutes | Nightly only |
@pytest.mark.slow |
Full physics simulations | Hours | Nightly only |
@pytest.mark.skip |
Temporarily disabled | β | Never |
Which marker should I use?
- Most tests β
unit: Testing a single function? Mock external dependencies, useunit. - Testing real binaries β
smoke: Need SOCRATES/AGNI/SPIDER actually running? Usesmoke. Module-level smoke tests (e.g. intests/atmos_clim/) validate a single binary with 1 timestep. Integration-level smoke tests (intests/integration/) validate the coupling framework end-to-end with dummy modules. - Testing module coupling β
integration: ARAGOG + AGNI working together? Useintegration. - Full science runs β
slow: Multi-hour simulations? Useslow.
CI/CD Pipeline
What Happens When You Open a PR
- Structure check: Validates
tests/mirrorssrc/proteus/ - Unit tests (Linux): Runs
pytest -m "unit and not skip"with coverage - Diff-cover: Checks 80% coverage on your changed lines
- Smoke tests (Linux): Runs
pytest -m "smoke and not skip" - Unit tests (macOS): Runs unit tests on macOS (no compiled binaries)
- Lint: Checks code style with ruff
- Summary: Aggregates results from all platforms into a unified report
Runtime: ~5-10 minutes
What Happens Nightly
The nightly workflow (ci-nightly.yml) is primarily triggered by docker-build.yml after the 2am UTC image rebuild. A 3am UTC cron acts as a fallback if the docker build didn't run. A deduplication check prevents running twice.
- Runs ALL tests (unit β smoke β integration β slow)
- Updates coverage thresholds (ratcheting)
- Uploads aggregate coverage (unit + smoke + integration) to Codecov
- Sets
PROTEUS_CI_NIGHTLY=1to enable additional smoke tests
Coverage Rules
| Rule | Value | What It Means |
|---|---|---|
| Grace period | 0.3% | Small coverage drops allowed (warning posted) |
| Diff-cover | 80% | Your changed lines need 80% coverage |
| Staleness | 48h | PR fails if nightly data is too old |
Test Layout
Tests mirror src/proteus/. Validation: bash tools/validate_test_structure.sh. Special dirs data, helpers, integration are handled in validation.
tests/
βββ integration/ # test_smoke_*.py, test_integration_*.py, test_aragog_*, test_std_config, etc.
βββ config/, utils/, plot/, star/, orbit/, interior/, escape/, outgas/, observe/, atmos_clim/, atmos_chem/
βββ grid/, inference/, data/
βββ test_cli.py, test_init.py
βββ conftest.py # Shared fixtures (see Test Infrastructure)
Fixtures (tests/conftest.py)
- Parameter classes:
EarthLikeParams,UltraHotSuperEarthParams,IntermediateSuperEarthParams(session-scoped). - Config paths:
config_earth,config_minimal,config_dummy,proteus_root. - Fixtures:
earth_params,ultra_hot_params,intermediate_params(instances of the above).
Integration-specific fixtures (e.g. multi-timestep runs, conservation checks) are in tests/integration/conftest.py. See Test Infrastructure for details.
Running Tests Locally
pytest -m "unit and not skip" # Unit only (matches PR)
pytest -m "smoke and not skip" # Smoke only
pytest -m "(unit or smoke) and not skip" # What PR runs
pytest -m integration # Integration
pytest -m slow # Slow
pytest -m "not slow" # All except slow
pytest --cov=src --cov-report=html # With coverage
For fast gate check: pytest -m "unit and not skip" --cov=src --cov-fail-under=<value> (value from pyproject.toml).
Adding New Tests
- Choose marker:
unit/smoke/integration/slow. - Create
tests/<module>/test_<filename>.pyif needed (mirror source). - Use
@pytest.mark.<marker>and docstrings; usepytest.approxfor floats. - Run
bash tools/validate_test_structure.sh; run the relevant marker group; ensure coverage meets the fast gate for unit changes.
Coverage Requirements
Coverage Gates
| Gate | Tests Included | When Checked | Threshold Source |
|---|---|---|---|
| Fast gate | unit + smoke | Every PR | [tool.proteus.coverage_fast] fail_under |
| Estimated total | union of PR (unit+smoke) + nightly | Every PR | [tool.coverage.report] fail_under |
| Full gate | unit + smoke + integration + slow | Nightly | [tool.coverage.report] fail_under |
| Diff-cover | changed lines only | Every PR | Hard-coded 80% |
What Each Test Tier Contributes
unitβ Bulk of Python logic coverage (functions, branches, error paths). Fastest feedback loop.smokeβ Covers binary wrapper code and real I/O paths that unit tests mock away.integrationβ Covers cross-module coupling paths (e.g., ARAGOG + JANUS handoff).slowβ Full scientific validation. Contributes to coverage but primarily validates physics, not code paths.
All thresholds auto-increase ("ratchet") and never decrease. Check coverage locally with pytest --cov=src --cov-report=html.
For details on how coverage is collected across workflows and how the estimated total is computed, see Coverage Collection & Reporting.
References
- Test Infrastructure β Coverage workflows, reusable quality gate, troubleshooting
- Test Building β Prompts for unit/integration tests
- Docker CI Architecture β Docker image, CI pipelines
- .github/copilot-instructions.md β Commands and thresholds