Testing Infrastructure

What This Document Is For

New to PROTEUS? This document explains how our testing system works: how to run tests, check code coverage, and troubleshoot common issues.

Key concepts:

Coverage measures what percentage of your code is tested. Higher is better.
CI (Continuous Integration) automatically runs tests when you push code.
Thresholds are minimum coverage percentages that must be met.

For test markers and CI pipelines, see Test Categorization. For writing tests, see Test Building.

Quick Start

First time? Install with pip install -e ".[develop]", then run tests:

pytest -m "unit and not skip"           # Fast unit tests (~2 min)
pytest -m "smoke and not skip"          # Smoke tests with real binaries
pytest --cov=src --cov-report=html      # Generate coverage report
open htmlcov/index.html                 # View coverage in browser

Before committing: 1. Run pytest -m "unit and not skip" — must pass 2. Run ruff check src/ tests/ && ruff format src/ tests/ — must pass 3. Run bash tools/validate_test_structure.sh — must pass

CI/CD Status

How CI Works

When you open a pull request, CI automatically:

Validates test file structure
Runs unit tests and checks coverage (must meet threshold)
Runs smoke tests
Checks code style with ruff

Current coverage thresholds (from pyproject.toml):

Fast gate: unit + smoke, checked on PRs
Full gate: all tests, checked nightly

Workflows

Workflow	Runs When	What It Does
`ci-pr-checks.yml`	Every PR	Unit + smoke tests (Linux), unit tests (macOS), lint, ~5-10 min
`docker-build.yml`	Daily 2am UTC / dependency changes	Rebuilds Docker image, then triggers nightly
`ci-nightly.yml`	Triggered by docker-build (fallback: 3am cron)	All tests including slow, updates thresholds, uploads coverage to Codecov

Key features:

Grace period: PRs can merge with ≤0.3% coverage drop (warning posted)
Diff-cover: 80% coverage required on changed lines
Auto-ratcheting: Thresholds only increase, never decrease

Developer Workflow

Write or modify code
Write or update tests (tests/<module>/test_<filename>.py mirrors src/proteus/<module>/<filename>.py)
Run tests: pytest tests/<module>/ or pytest -m unit
Check coverage: pytest --cov=src (CI uses --cov=src for unit)
Validate structure: bash tools/validate_test_structure.sh
Lint: ruff check src/ tests/ and ruff format src/ tests/
Commit and push; CI runs automatically

Adding new code: Create src/proteus/<module>/<file>.py and tests/<module>/test_<filename>.py. Use fixtures from tests/conftest.py and tests/integration/conftest.py. See Test Building for prompts and Test Categorization for fixtures list.

Basic test structure: Use @pytest.mark.unit (or other marker), docstrings, pytest.approx() for floats, and unittest.mock for external/heavy code in unit tests. Example:

@pytest.mark.unit
def test_function_basic():
    """Test basic functionality."""
    result = function_to_test(input_value)
    assert result == pytest.approx(expected, rel=1e-5)

Best Practices

Practice	Guideline
Structure	Mirror `src/proteus/` in `tests/`; one test file per source file
Markers	Use `unit` / `smoke` / `integration` / `slow` consistently
Floats	Use `pytest.approx(val, rel=1e-5)` or `np.testing.assert_allclose`; never `==`
Mocking	Unit tests mock I/O and heavy physics; integration tests use real modules
Docstrings	Explain physical scenario being tested
Determinism	Set seeds (`np.random.seed(42)`); use `tmp_path` for temp files
Coverage	Focus on critical paths; use `bash tools/coverage_analysis.sh`

See Test Categorization for marker details and Test Building for prompts.

Coverage Analysis Workflow

Generate reports:

pytest --cov=src --cov-report=term-missing   # Terminal report with missing lines
pytest --cov=src --cov-report=html           # HTML report
open htmlcov/index.html                      # View in browser
bash tools/coverage_analysis.sh              # Coverage by module
coverage report --show-missing --skip-covered  # Only uncovered files

Thresholds (in pyproject.toml): - [tool.proteus.coverage_fast] fail_under — Fast gate (PRs) - [tool.coverage.report] fail_under — Full gate (nightly)

Thresholds auto-ratchet upward; never decrease manually.

Coverage Collection & Reporting

Two-Workflow Architecture

Coverage data flows through two CI workflows that serve different purposes:

Workflow	Tests Run	Codecov Flag	Artifact Produced
`ci-pr-checks.yml`	unit + smoke	`unit-tests`	`coverage-unit.json` (this PR only)
`ci-nightly.yml`	unit + smoke + integration (+ slow)	`nightly`	`coverage-integration-only.json` (combined)

The PR workflow runs fast tests on every pull request. The nightly workflow runs the full suite and saves its combined coverage as an artifact that subsequent PR runs download.

Estimated Total Coverage (Union-of-Lines)

On each PR, the workflow estimates what total coverage would be if integration tests were also run, without actually running them. It does this by combining the PR's own coverage with the latest nightly artifact:

 ci-pr-checks.yml                        ci-nightly.yml (last successful)
 ┌──────────────────┐                     ┌──────────────────┐
 │  Run unit+smoke  │                     │  coverage-        │
 │  on PR code      │                     │  integration-     │
 │  ──────────────  │                     │  only.json        │
 │  coverage-       │                     │  (unit+smoke+     │
 │  unit.json       │                     │   integration)    │
 └────────┬─────────┘                     └────────┬──────────┘
          │                                        │
          └──────────────┬─────────────────────────┘
                         │
                ┌────────▼────────┐
                │  Union-of-Lines │
                │  Algorithm      │
                └────────┬────────┘
                         │
                ┌────────▼────────┐
                │  Estimated      │
                │  Total Coverage │
                │  (compared to   │
                │  full threshold)│
                └─────────────────┘

Algorithm (4 steps):

Parse both JSON files (coverage-unit.json from this PR, coverage-integration-only.json from nightly)
Normalize file paths (strip container prefixes like /opt/proteus/) so lines match across environments
Compute union of covered lines and union of executable lines across both datasets
Compare 100 * len(covered_union) / len(executable_union) against the full threshold from pyproject.toml

Stale nightly lines

The nightly artifact (coverage-integration-only.json) contains combined unit + smoke + integration coverage, not just integration. If unit/smoke coverage drops on a PR, stale lines from the nightly artifact can mask the regression until the next nightly run updates the baseline. A 48-hour staleness check mitigates this but does not eliminate it.

Codecov Integration

Two upload flags partition coverage data on codecov.io:

Flag	Uploaded By	Contains
`unit-tests`	`ci-pr-checks.yml`	Unit + smoke coverage from this PR
`nightly`	`ci-nightly.yml`	Unit + smoke + integration coverage from nightly

Configuration in codecov.yml:

Project target: auto — Codecov tracks the project coverage trend automatically
Patch target: 80% — new/changed lines must have 80% coverage
carryforward: true on both flags — if a flag isn't uploaded in a commit (e.g., nightly doesn't run on PRs), Codecov carries forward the last known value instead of reporting a drop

README Coverage Badge

The Codecov badge in README.md must include /branch/main/ in its URL to display the correct coverage value. Without this path segment, Codecov returns "unknown".

Correct format: https://codecov.io/gh/FormingWorlds/PROTEUS/branch/main/graph/badge.svg

Badge Validation Tests

The file tests/test_readme_badges.py contains unit tests that guard against badge URL regressions:

All badge image URLs use HTTPS and point to allowed domains
All expected badges (Unit Tests, Integration Tests, docs, License, Codecov, DOI) are present
The Codecov badge URL includes /branch/main/
Workflow files referenced by badge URLs exist on disk

These tests run on every PR as part of the unit test suite.

Pre-commit Checklist

Run these before every commit:

# 1. Tests pass
pytest -m "unit and not skip"

# 2. Lint passes
ruff check src/ tests/ && ruff format src/ tests/

# 3. Structure valid
bash tools/validate_test_structure.sh

For new code:

[ ] Added tests with correct markers (see Test Categorization)
[ ] Coverage meets fast gate threshold
[ ] Docstrings explain physical scenarios

Troubleshooting

Common Issues

Issue	Cause	Solution
`pytest: unrecognized arguments: --cov`	pytest-cov not installed	`pip install -e ".[develop]"`
Coverage below threshold	Coverage dropped	Add tests; see `coverage report --show-missing`; do not lower threshold
Tests not found	Discovery / naming	`pytest --collect-only`; ensure `test_.py` and `test_` functions
Import errors	Package not installed	`pip install -e ".[develop]"`; check `src/` layout
CI fails, local passes	Environment / deps	Match Python version and pyproject.toml; check CI logs
Ruff fails	Style violations	`ruff check --fix src/ tests/` and `ruff format src/ tests/`

Debugging Tests

pytest -v --showlocals -x tests/module/test_file.py::test_function
pytest --pdb   # Debugger on failure

Stale Nightly Baseline

PR checks compare coverage against the last successful nightly run. If the nightly workflow fails (e.g. data download timeout, CI infrastructure issues, or transient test failures), the baseline becomes stale (>48 hours old) and PRs will fail validation. To fix this, trigger the nightly workflow manually and wait for it to complete.

Docker CI

Build fails: docker build -t proteus-test . locally; check Dockerfile and deps.
Image pull fails: Verify ghcr.io/formingworlds/proteus:latest is public.
Tests fail in container: docker run -it ghcr.io/formingworlds/proteus:latest bash and run pytest -m unit -v inside.

Reusable Quality Gate for Ecosystem Modules

PROTEUS provides a reusable workflow for ecosystem modules (CALLIOPE, JANUS, MORS, etc.) to adopt consistent testing standards.

Location: .github/workflows/proteus_test_quality_gate.yml

Purpose: Standardized test quality gate that ecosystem modules can call from their own CI workflows.

Inputs:

Input	Default	Description
`python-version`	`3.12`	Python version for testing
`coverage-threshold`	`30`	Minimum coverage percentage
`grace-period`	`0.3`	Allowed coverage drop (percentage points)
`working-directory`	`.`	Project subdirectory
`pytest-args`	`''`	Additional pytest arguments

Why use it:

Consistency: Same testing standards across all ecosystem modules
Maintenance: Updates to quality gate propagate to all modules
Best practices: Includes coverage reporting, artifact upload, proper caching

Implementation Guide for Ecosystem Modules

Step 1: Ensure your module has the required structure

your-module/
├── src/
│   └── your_module/
│       └── __init__.py
├── tests/
│   └── test_*.py
├── pyproject.toml          # Must have [project.optional-dependencies] develop = [...]
└── .github/
    └── workflows/
        └── tests.yml       # Create this file

Step 2: Add [develop] dependencies to pyproject.toml

[project.optional-dependencies]
develop = [
    "pytest>=7.0",
    "pytest-cov>=4.0",
    "ruff>=0.1.0",
]

Step 3: Create .github/workflows/tests.yml

Basic example:

name: Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    uses: FormingWorlds/PROTEUS/.github/workflows/proteus_test_quality_gate.yml@main
    with:
      coverage-threshold: 30

Step 4: Configure for your module's needs

Example configurations for different scenarios:

# CALLIOPE - Higher coverage, specific markers
jobs:
  test:
    uses: FormingWorlds/PROTEUS/.github/workflows/proteus_test_quality_gate.yml@main
    with:
      coverage-threshold: 50
      pytest-args: '-m "unit and not skip" -v'

# JANUS - Lower initial threshold, exclude slow tests
jobs:
  test:
    uses: FormingWorlds/PROTEUS/.github/workflows/proteus_test_quality_gate.yml@main
    with:
      coverage-threshold: 25
      pytest-args: '-m "not slow"'

# MORS - Monorepo with subdirectory
jobs:
  test:
    uses: FormingWorlds/PROTEUS/.github/workflows/proteus_test_quality_gate.yml@main
    with:
      working-directory: 'python'
      coverage-threshold: 40

Codecov Integration

To enable Codecov uploads, add the CODECOV_TOKEN secret to your repository:

Go to codecov.io and connect your repository
Copy the upload token
In GitHub: Settings → Secrets → Actions → New repository secret
Name: CODECOV_TOKEN, Value: your token

The workflow automatically uploads coverage if the secret exists.

Transitioning from Custom Workflows

If your module has an existing test workflow:

Keep your workflow temporarily: Rename to tests-old.yml
Create new workflow: Add tests.yml using the quality gate
Compare results: Ensure both pass on a few PRs
Remove old workflow: Delete tests-old.yml once confident

Troubleshooting

Issue	Solution
`pip install` fails	Ensure `pyproject.toml` has valid `[project.optional-dependencies]`
Coverage too low	Lower `coverage-threshold` initially, ratchet up over time
Tests not found	Check `tests/` directory exists and contains `test_*.py` files
Codecov upload fails	Add `CODECOV_TOKEN` secret or ignore (non-fatal)

References

PROTEUS Documentation

Test Categorization — Markers, CI pipeline, fixtures
Test Building — Prompts for unit/integration tests
Docker CI Architecture — Docker image, CI pipelines
AI-Assisted Development — Using AI for tests and code review
tests/conftest.py — Shared fixtures
.github/copilot-instructions.md — Commands and thresholds

Testing Infrastructure

What This Document Is For

Table of Contents

Quick Start

CI/CD Status

How CI Works

Workflows

Developer Workflow

Best Practices

Coverage Analysis Workflow

Coverage Collection & Reporting

Two-Workflow Architecture

Estimated Total Coverage (Union-of-Lines)

Codecov Integration

README Coverage Badge

Badge Validation Tests

Pre-commit Checklist

Troubleshooting

Common Issues

Debugging Tests

Stale Nightly Baseline

Docker CI

Reusable Quality Gate for Ecosystem Modules

Implementation Guide for Ecosystem Modules

Codecov Integration

Transitioning from Custom Workflows

Troubleshooting

References

PROTEUS Documentation

External Resources