Testing suite
What testing means here
Tests verify that the code does what was written. Physical correctness is judged by data, by comparison against analytic solutions, against benchmark codes (SPIDER, PALEOS), and against published references; the test suite is a regression net that locks the implementation in place once a behaviour has been validated. A passing suite confirms that no recent change has perturbed locked behaviour, not that the behaviour itself is physically correct.
Markers
The internal pytest marker scheme has four tiers, registered under [tool.pytest.ini_options].markers in pyproject.toml with --strict-markers enforced:
unit: Mocked or analytic-only physics; under 100 ms each, no real solver call. EOS lookups, mesh helpers, phase-evaluator branches, parser validation, JAX-vs-numpy parity on point inputs, energy-equation invariants.smoke: FullEntropySolver.solve()at relaxed tolerance against representative configurations (closed mantle, gravitational separation, JAX RHS via CVODE).integration: Real-physics integration against published references (PALEOS, SPIDER bit-parity); nightly picks up tests under this marker automatically once present.slow: Anything taking more than one minute wall: multi-Myr coupled-style runs and convergence studies; manual only.
Running tests by marker
pytest -m unit
pytest -m smoke
pytest -m integration
pytest -m slow
pytest -m "not skip" # full suite
pytest --collect-only -m <marker> reports the live count without running anything.
Public versus internal categories
Public-facing badges (README, website) collapse smoke + integration + slow into a single "Integration Tests" category, because a four-way taxonomy is confusing to non-developer readers. The four-marker internal scheme remains for CI infrastructure granularity and is what every pytest invocation, workflow, and CI tier sees directly.
Badge system
JSON files live at .github/badges/tests-{total,unit,integration}.json, regenerated by the Refresh test count badges workflow on every push to main, fetched live by shields.io.
Canonical specification
The full PROTEUS Ecosystem Testing Standard is documented at https://proteus-framework.org/PROTEUS/Explanations/ecosystem_testing_standard/.
Prerequisites
Install the test extras:
pip install -e ".[test]"
The JAX-path tests additionally need the jax extra (pip install -e ".[jax]"); without it the JAX parity tests are skipped, not failed. SPIDER-format EOS tables are also required for a subset of tests; if FWL_DATA is unset or the expected files are missing, those tests skip cleanly. For guidance on writing new tests see How to build tests.
Other run patterns
Single test
pytest tests/test_entropy_pytest.py::TestEnergyBalanceCoreBC::test_energy_balance_rhs_bit_parity_prescribed_inputs
Parallel runs
pyproject.toml does not set a default addopts; pytest runs serial unless -n auto (or another xdist option) is supplied explicitly. CI invokes pytest -m "unit and not slow" -n auto (ci_tests.yml) and pytest -m "unit or smoke or integration or slow" -n auto (nightly.yml); reproduce locally by adding the same flag:
pytest -m unit -n auto # parallel unit run
pytest -m "unit or smoke" -n auto -ra -v # parallel + summary + verbose
Drop -n auto for serial execution when debugging a flaky test or attaching a debugger.
Sandbox-friendly invocation
Some environments forbid signal-based timeouts. Use a thread-based timeout instead:
pytest -p no:faulthandler --timeout=60 --timeout-method=thread tests/
CI tiers
| Trigger | Markers | Budget | Coverage |
|---|---|---|---|
Push / PR (ci_tests.yml) |
unit and not slow |
< 5 min | Yes (unit tier); uploaded to Codecov under flag ci from ubuntu-latest + py3.12 only |
Nightly cron + push to main (nightly.yml, 02:30 UTC) |
unit or smoke or integration or slow |
< 90 min | Yes (full suite); uploaded to Codecov under flag nightly |
Manual workflow_dispatch |
as above | < 90 min | Yes |
Push CI runs the unit tier only because each smoke test executes a full EntropySolver call (5 to 15 min on a 2-vCPU runner under coverage instrumentation). The nightly tier carries the canonical 90% coverage floor; the per-push upload is a fast-feedback companion view of the unit subset.
Fixtures
Shared fixtures live in tests/conftest.py. The most load-bearing one is shared_eos:
shared_eos (session)
A session-scoped EOS loader that opens the SPIDER-format pressure-entropy tables once per test session and hands the EntropyEOS instance to every test that needs it. Without this fixture, the integration tests would each rebuild the lookup tables (~ 12 to 15 s per test); with it, the cost amortises to a single load (~ 3.5 s) across the whole nightly run.
If FWL_DATA is unset or the expected files are missing, shared_eos skips the dependent tests rather than failing.
Parallelization
Tests are written to be order-independent and run cleanly under pytest-xdist. Pass -n auto to use all available cores; CI does this on both the unit and nightly tiers. If you observe flakiness only under xdist, that is a bug in the test (not in xdist).
Coverage
pytest --cov=aragog --cov-report=html -m "unit or smoke"
Open htmlcov/index.html to inspect line-by-line coverage. Both push CI
(unit tier) and the nightly (full suite) emit --cov-report=xml and upload
to Codecov under separate flags (ci and nightly). The project floor is
90%, enforced via [tool.coverage.report].fail_under in pyproject.toml.
Linting
Before committing, format and check all files:
ruff check --fix src/ tests/ tools/
ruff format src/ tests/ tools/
The local ruff (often 0.12.x) and the CI ruff (0.15.x) sometimes disagree on
formatting drift; CI is canonical. Run BOTH ruff check and ruff format
before pushing — format does NOT catch lint rules like E402 misplaced
imports.