Testing suite

What testing means here

Tests verify that the code does what was written. Physical correctness is judged by data, by comparison against analytic solutions, against benchmark codes (SPIDER, PALEOS), and against published references; the test suite is a regression net that locks the implementation in place once a behaviour has been validated. A passing suite confirms that no recent change has perturbed locked behaviour, not that the behaviour itself is physically correct.

Markers

The internal pytest marker scheme has four tiers, registered under [tool.pytest.ini_options].markers in pyproject.toml with --strict-markers enforced:

unit: Mocked or analytic-only physics; under 100 ms each, no real solver call. EOS lookups, mesh helpers, phase-evaluator branches, parser validation, JAX-vs-numpy parity on point inputs, energy-equation invariants.
smoke: Full EntropySolver.solve() at relaxed tolerance against representative configurations (closed mantle, gravitational separation, JAX RHS via CVODE).
integration: Real-physics integration against published references (PALEOS, SPIDER bit-parity); nightly picks up tests under this marker automatically once present.
slow: Anything taking more than one minute wall: multi-Myr coupled-style runs and convergence studies; manual only.

Running tests by marker

pytest -m unit
pytest -m smoke
pytest -m integration
pytest -m slow
pytest -m "not skip"  # full suite

pytest --collect-only -m <marker> reports the live count without running anything.

Public versus internal categories

Public-facing badges (README, website) collapse smoke + integration + slow into a single "Integration Tests" category, because a four-way taxonomy is confusing to non-developer readers. The four-marker internal scheme remains for CI infrastructure granularity and is what every pytest invocation, workflow, and CI tier sees directly.

Badge system

The Refresh test count badges workflow regenerates tests-{total,unit,integration}.json on every push to main and publishes them to the root of the dedicated badges branch, from which shields.io fetches them live. The badge JSON lives only on that branch, not on main.

Canonical specification

The full PROTEUS Ecosystem Testing Standard is documented at https://proteus-framework.org/PROTEUS/Explanations/ecosystem_testing_standard/.

Prerequisites

Install the test extras:

pip install -e ".[test]"

The JAX-path tests additionally need the jax extra (pip install -e ".[jax]"); without it the JAX parity tests are skipped, not failed. SPIDER-format EOS tables are also required for a subset of tests; if FWL_DATA is unset or the expected files are missing, those tests skip cleanly. For guidance on writing new tests see How to build tests.

Other run patterns

Single test

pytest tests/test_entropy_pytest.py::TestEnergyBalanceCoreBC::test_energy_balance_rhs_bit_parity_prescribed_inputs

Parallel runs

pyproject.toml does not set a default addopts; pytest runs serial unless -n auto (or another xdist option) is supplied explicitly. CI invokes pytest -m "unit and not slow" -n auto (ci_tests.yml) and pytest -m "unit or smoke or integration or slow" -n auto (nightly.yml); reproduce locally by adding the same flag:

pytest -m unit -n auto                       # parallel unit run
pytest -m "unit or smoke" -n auto -ra -v     # parallel + summary + verbose

Drop -n auto for serial execution when debugging a flaky test or attaching a debugger.

Sandbox-friendly invocation

Some environments forbid signal-based timeouts. Use a thread-based timeout instead:

pytest -p no:faulthandler --timeout=60 --timeout-method=thread tests/

CI tiers

Trigger	Markers	Budget	Coverage
Push / PR (`ci_tests.yml`)	`unit and not slow`	< 5 min	Yes (unit tier); uploaded to Codecov under flag `ci` from `ubuntu-latest` + py3.12 only
Nightly cron + push to main (`nightly.yml`, 02:30 UTC)	`unit or smoke or integration or slow`	< 90 min	Yes (full suite); uploaded to Codecov under flag `nightly`
Manual `workflow_dispatch`	as above	< 90 min	Yes

Push CI runs the unit tier only because each smoke test executes a full EntropySolver call (5 to 15 min on a 2-vCPU runner under coverage instrumentation). The nightly tier carries the canonical 90% coverage floor; the per-push upload is a fast-feedback companion view of the unit subset.

Fixtures

Shared fixtures live in tests/conftest.py. The most load-bearing one is shared_eos:

`shared_eos` (session)

A session-scoped EOS loader that opens the SPIDER-format pressure-entropy tables once per test session and hands the EntropyEOS instance to every test that needs it. Without this fixture, the integration tests would each rebuild the lookup tables (~ 12 to 15 s per test); with it, the cost amortises to a single load (~ 3.5 s) across the whole nightly run.

If FWL_DATA is unset or the expected files are missing, shared_eos skips the dependent tests rather than failing.

Parallelization

Tests are written to be order-independent and run cleanly under pytest-xdist. Pass -n auto to use all available cores; CI does this on both the unit and nightly tiers. If you observe flakiness only under xdist, that is a bug in the test (not in xdist).

Coverage

pytest --cov=aragog --cov-report=html -m "unit or smoke"

Open htmlcov/index.html to inspect line-by-line coverage. Both push CI (unit tier) and the nightly (full suite) emit --cov-report=xml and upload to Codecov under separate flags (ci and nightly). The project floor is 90%, enforced via [tool.coverage.report].fail_under in pyproject.toml.

Linting

Before committing, format and check all files:

ruff check --fix src/ tests/ tools/
ruff format src/ tests/ tools/

The local ruff (often 0.12.x) and the CI ruff (0.15.x) sometimes disagree on formatting drift; CI is canonical. Run BOTH ruff check and ruff format before pushing — format does NOT catch lint rules like E402 misplaced imports.