Testing suite
Tests verify that the code does what was written; physical correctness is judged by data, not by tests. The suite catches regressions in equilibrium chemistry, solubility laws, oxygen-fugacity buffers, and the hybrid solver, but it cannot certify that those formulae match nature. That judgement belongs to the validation runs and the published comparisons against measured magma-ocean outgassing data.
Internal 4-marker scheme
Every test in the suite carries exactly one of four markers, applied either at module level (pytestmark = pytest.mark.X) or per class (@pytest.mark.X).
unit(96 tests): in-process tests of individual chemistry helpers, equilibrium-constant fits, solubility laws, and structure formulae. No realequilibrium_atmospherecall. Sub-second per test.smoke(25 tests): realequilibrium_atmosphereinvocations on minimal configurations (single composition, default species set, one solve). Sub-30 s per test, exercises the hybrid solver code paths.integration(5 tests): full multi-species CHNS solves with mass-conservation invariants, all eleven species active.slow: long parameter sweeps and convergence studies. Currently empty.
Local commands
pytest -m unit # 96 fast unit tests
pytest -m smoke # 25 minimal-config solver tests
pytest -m integration # 5 full CHNS solves
pytest -m slow # empty in CALLIOPE
pytest -m "(unit or smoke) and not skip" # PR-gate selection
pytest -m "not skip" # everything that should ever run
Coverage:
pytest --cov=src/calliope --cov-report=term -m "not skip"
pytest --cov=src/calliope --cov-report=html -m "not skip" # htmlcov/
Public-facing badges versus internal taxonomy
Public-facing badges (README, project website) collapse smoke + integration + slow into a single Integration Tests category, because a 4-way taxonomy is confusing to non-developer readers.
The 4-marker internal scheme remains for CI infrastructure granularity: the PR gate runs (unit or smoke), the nightly runs everything, and the test-count badge fetches the JSON files written by the publish-test-badges workflow.
Badge system
Three JSON files at .github/badges/tests-{total,unit,integration}.json are rewritten by .github/workflows/publish-test-badges.yml on every push to main (paths-filtered to source, tests, tools, and pyproject.toml).
Shields.io fetches them live via the endpoint URL embedded in the test-count badge.
The publish workflow auto-commits the badges with [skip ci] and retries the push up to three times to absorb concurrent main-branch updates.
Coverage gate
[tool.coverage.report] fail_under in pyproject.toml sets the minimum combined line + branch coverage for the nightly run.
The PR gate has a pre-flight step that fetches the base branch's pyproject.toml and refuses any PR that drops fail_under below the value on main; both states tolerate a base branch with no fail_under declared.
The 90 % ceiling caps the ratchet so that defensive paths do not become unfailable.
Coverage union estimation
The PR gate downloads the most recent nightly-coverage artifact from main (coverage.xml + coverage.json + nightly-timestamp.txt, 14 d retention) and line-ORs the unit-tier coverage with the nightly's full coverage to produce a union estimate.
The result is written to $GITHUB_STEP_SUMMARY as informational output; it does not gate the PR.
A staleness threshold of 48 h and a grace band of 0.3 % apply to the warn / fail / ok decision.
Canonical specification
The repository-wide rules that every PROTEUS-ecosystem submodule follows are at proteus-framework.org/PROTEUS/Explanations/ecosystem_testing_standard/.