How to build tests
This page is about writing a new test, by hand or with an LLM. For running the existing suite see Testing suite.
Decision tree: which marker?
The marker is the load-bearing decision. CI runs -m unit on every push and
-m "(unit or smoke or integration) and not slow" nightly; an unmarked test
is invisible to CI. Pick the strictest marker the test fits.
| Use ... | When ... |
|---|---|
@pytest.mark.unit |
< 100 ms, no real solver call. EOS helpers, config validators, mixing math, melting-curve evaluation, JAX-vs-numpy parity on a single point, isolated branch coverage with mocked density. |
@pytest.mark.smoke |
One full-solver call at 1 \(M_\oplus\) that finishes in seconds. Verifies the whole code path runs end to end at relaxed tolerance. |
@pytest.mark.integration |
Full-solver call against a published-reference comparison (Zeng+2019 mass-radius, Seager+2007 density profile, PALEOS rocky 1 + 5 \(M_\oplus\)). |
@pytest.mark.slow |
Composition grids, tolerance-convergence studies, anything that takes minutes per test. Manual only. |
Tests covering more than one tier (e.g. an integration test that also exercises a unit-tier helper) carry a single marker matching the dominant runtime.
Choosing a file
Naming convention: test_<module>_<aspect>.py, lower-snake, one module per
file where possible.
| Situation | Where the test goes |
|---|---|
| New unit test for an existing module | test_<module>.py if it exists, else create it. |
| Branch-coverage test for a hard-to-reach path | test_<module>_branches.py (matches test_eos_dispatch_branches.py, test_paleos_unified_branches.py, ...). |
| Failure-mode test (raises, validation errors) | test_<module>_failures.py (matches test_eos_dispatch_failures.py). |
| Smoke / integration solver test | test_<scenario>.py (matches test_MR_rocky.py, test_Seager_water.py). |
| Regression for a fixed bug | Add to the closest existing file; do not create one regression file per bug. |
Float comparisons
Always pytest.approx, never ==.
assert result == pytest.approx(expected, rel=1e-6)
Choose the tolerance to match the physics, not the implementation: a well-converged structure ODE matches Zeng+2019 to ~3 % at 1 \(M_\oplus\), not 1e-9. State the chosen tolerance with a one-line comment naming the source or the limiting factor.
Fixtures
| Fixture | Use when ... |
|---|---|
zalmoxis_root |
The test reads or writes a path under the repo root; skip-if-unset behaviour is automatic. |
cached_solver |
The test triggers a full rocky or water solver run; identical parameter combinations within a file share one cache hit. |
Adding a new fixture: put it in tests/conftest.py if it is shared across
files; keep it module-local otherwise. Session-scope only when the fixture
is genuinely expensive to build (table loaders, solver calls).
Reference values
Every reference value asserted by a test must come from a primary source cited inline. Examples:
# Zeng+2019 mass-radius for 1 M_E rocky at CMF=0.325 (their Eq. 2 fit).
assert R_calc == pytest.approx(R_zeng, rel=3e-2)
# Stacey 2008 Earth CMB temperature: 4500-6000 K at 136 GPa.
assert 4500.0 <= T_cmb <= 6000.0
Do not "guess" a tolerance from a previous run. If you do not have a reference, the test belongs in the unit tier with a synthetic input whose expected output you can derive analytically.
Mocking EOS
Mocking is appropriate when:
- The test isolates a non-EOS component (the ODE integrator, the mass-radius solver, a config validator) and EOS evaluation is a confounder.
- The point of the test is the analytic limit (constant density, polytropic).
Mocking is not appropriate when the test is meant to verify
EOS-table-dependent physics (mushy-zone width, phase routing, binodal
suppression). For those, exercise the real PALEOS or Seager paths through
the cached_solver or a direct call.
The _paleos_mock.py helper provides a lightweight stand-in for the
upstream PALEOS Python package on CI runners that do not vendor it; use it
only when the upstream package is genuinely unavailable, not as a default.
Comment hygiene
Inline comments and docstrings should explain why the test exists, never when it was added or what it used to do. Acceptable:
# Stixrude+14 cryoscopic depression sets mushy zone factor to 0.79.
# Below 0.7 the solidus crosses the core-mantle boundary at 5 M_E,
# which is unphysical. Validate the rejection.
Not acceptable:
# Added in T2.3 to cover the basin attractor we found on 2026-04-27.
# Previously this assertion was rel=1e-3; loosened to 3e-2 in commit abc1234
# after the BLAS-noise rerun.
History belongs in the commit message and the PR description, not in the test source.
A worked example
Adding a unit test for a hypothetical helper clamp_to_grid(P, T, table):
# tests/test_eos_clamp_branches.py
from __future__ import annotations
import numpy as np
import pytest
from zalmoxis.eos.interpolation import clamp_to_grid
@pytest.mark.unit
def test_clamp_below_min_p_pins_to_grid_edge():
"""When P is below the table minimum, clamp_to_grid returns P_min."""
table_P = np.array([1e5, 1e6, 1e7])
table_T = np.array([300.0, 1000.0, 3000.0])
P_clamped, T_clamped = clamp_to_grid(0.5e4, 1500.0, table_P, table_T)
assert P_clamped == pytest.approx(1e5)
assert T_clamped == pytest.approx(1500.0)
@pytest.mark.unit
def test_clamp_within_grid_is_identity():
"""Inside-grid inputs must pass through unchanged (no rounding)."""
table_P = np.array([1e5, 1e6, 1e7])
table_T = np.array([300.0, 1000.0, 3000.0])
P_clamped, T_clamped = clamp_to_grid(5e6, 1500.0, table_P, table_T)
assert P_clamped == pytest.approx(5e6)
assert T_clamped == pytest.approx(1500.0)
Two unit tests, one branch each, both fast, both with a one-line docstring stating the rationale.
Coverage gate and pragma usage
Zalmoxis enforces a 90% coverage gate in the nightly CI (see
Testing suite > Coverage gate). Most defensive
code is already excluded automatically through the exclude_lines list in
[tool.coverage.report] (def __repr__, if TYPE_CHECKING:,
@abstractmethod, raise NotImplementedError, etc.) — you do not need
inline pragmas for those.
For inline # pragma: no cover, use it only on genuinely defensive
branches that no realistic test can reach without contrived inputs:
LinAlgError from np.linalg.lstsq, RuntimeError recovery on a
non-finite mass evaluation, dev-gated if _PROFILE: instrumentation,
heavy-data-dependent fallback paths. Always pair the pragma with a
one-line reason:
except (ValueError, RuntimeError) as exc: # pragma: no cover - stale-cache sign mismatch; defensive
...
Do not pragma normal-execution paths. The reviewer will push back; "I couldn't be bothered to write a test" is not a defensive branch.
Anti-patterns
- Forgetting the marker. Tests without
@pytest.mark.unit(or another marker) are invisible to CI. Runpytest --collect-only -m unit | tailafter adding a test to confirm pickup. With--strict-markers, a typo'd marker now fails the run. - Hardcoding paths. Use
zalmoxis_rootorpathlib.Path(__file__).parent, never absolute paths. - Test ordering dependence. Each test must pass in isolation. xdist reorders aggressively; relying on side-effects from a previous test is a bug, not a shortcut.
- Asserting on log output. Logs change for cosmetic reasons; assert on
return values or state. Use
caplogonly when the log line is itself the contract. - Sleeping or polling. If a test needs a wait, the code under test has a race condition; fix that first.
Suggested LLM prompt
When asking an LLM (Claude, Cursor, Copilot) to add or modify tests, paste the prompt below at the start of the request along with the relevant source file. The prompt encodes the PROTEUS and Zalmoxis testing principles so the generated test passes review without an iteration round.
You are writing a pytest test for the Zalmoxis interior-structure solver
(part of the PROTEUS ecosystem). Follow these rules strictly.
MARKERS (mandatory; CI-invisible without one):
- @pytest.mark.unit: < 100 ms, no real solver call. Use for EOS helpers,
config validators, mixing math, JAX-vs-numpy parity, branch coverage.
- @pytest.mark.smoke: one full-solver call at 1 M_E that finishes in
seconds. Use for end-to-end smoke checks.
- @pytest.mark.integration: full-solver call against a published reference
(Zeng+2019, Seager+2007, PALEOS rocky 1 + 5 M_E).
- @pytest.mark.slow: composition grids or tolerance studies that take
minutes per test. Manual only.
Pick the strictest marker the test fits. Do not double-mark.
FLOAT COMPARISONS:
- Always pytest.approx, never ==.
- Choose the tolerance to match the physics, not the implementation.
- State the source or limiting factor in a one-line comment.
REFERENCE VALUES:
- Cite the primary source inline. If no source is available, the test must
use an analytically derivable synthetic input.
- Do not infer tolerances from a previous run.
FIXTURES:
- Use `zalmoxis_root` for paths.
- Use `cached_solver` for full rocky / water solver runs.
- Add new fixtures to tests/conftest.py only if shared across files.
MOCKING:
- Mock EOS only when the test isolates a non-EOS component or exercises an
analytic limit. For EOS-table-dependent physics, use the real PALEOS or
Seager path via `cached_solver` or a direct call.
COVERAGE EXCLUSIONS:
- The 90% gate runs in nightly CI. Do NOT add `# pragma: no cover` just to
pass the gate. The exclude_lines list in pyproject.toml already covers
def __repr__, raise NotImplementedError, if TYPE_CHECKING:, @abstractmethod.
- Inline pragma is acceptable ONLY for: numerical pathology recovery
(LinAlgError, RuntimeError on non-finite values, brentq same-sign
bracket), dev-gated diagnostics behind env vars, heavy-data-dependent
fallback paths.
- Every inline pragma must carry a one-line "why this branch is
unreachable in unit tests" justification.
NAMING:
- Files: test_<module>.py, test_<module>_branches.py for hard-to-reach
paths, test_<module>_failures.py for failure-mode tests.
- Function names: snake_case, descriptive, no test_1 / test_2.
STYLE:
- Single quotes (ruff).
- `from __future__ import annotations` at the top of every file.
- One-line docstring stating the rationale for the test.
- Comments explain WHY, never WHEN added or what the code used to do.
- No project-tracking labels (T1.x, Stage X, ISO dates, commit SHAs).
ANTI-PATTERNS:
- No bare assertions on float equality.
- No hardcoded absolute paths.
- No reliance on test execution order.
- No assertions on log content unless the log line is itself the contract.
- No sleep / poll. If you need to wait, the code under test has a race.
OUTPUT:
- Produce only the test source. Do not modify the module under test.
- Place the new test in the appropriate existing file, or name a new file
per the naming convention.
- After the test, list (a) which marker you chose and why, (b) the
reference source for any literature value, (c) the tolerance and its
justification.