Docker-Based CI/CD Architecture for PROTEUS

What This Document Is For

New to PROTEUS CI? This document explains how our Docker-based testing infrastructure works. Docker containers provide a consistent environment with pre-compiled physics modules, making CI runs fast and reproducible.

Key concept: Instead of compiling SOCRATES, AGNI, PETSc, and SPIDER on every CI run (~60 min), we use a pre-built Docker image (~5 min startup).

For test markers and categories, see Test Categorization. For coverage workflows, see Test Infrastructure. For writing tests, see Test Building.

Overview

This architecture solves slow compilation times by using a pre-built Docker image containing the full PROTEUS environment with compiled physics modules. The image is built on demand and used by all CI/CD workflows.

Architecture Components

1. Dockerfile

Location: /Dockerfile

Purpose: Define the pre-built environment with all dependencies and compiled physics modules.

Key Features: - Base: Python 3.12 on Debian Bookworm (slim) - System dependencies: gfortran, make, cmake, git, NetCDF libraries - Julia installation via official installer - Compiles all physics modules: - SOCRATES (radiative transfer) - PETSc (numerical computing) - SPIDER (interior evolution) - AGNI (radiative-convective atmosphere) - Installs Python packages from pyproject.toml - Optimized for size with cache cleanup

Environment Variables:

FWL_DATA=/opt/proteus/fwl_data
RAD_DIR=/opt/proteus/socrates
AGNI_DIR=/opt/proteus/AGNI
PETSC_DIR=/opt/proteus/petsc
PETSC_ARCH=arch-linux-c-opt
PROTEUS_DIR=/opt/proteus

2. docker-build.yml (The Updater)

Location: .github/workflows/docker-build.yml

Purpose: Build and push the Docker image to GitHub Container Registry.

Triggers: - Schedule: Nightly at 02:00 UTC - Push to main when dependencies change: - pyproject.toml - environment.yml - Dockerfile - tools/get_*.sh scripts

Output: ghcr.io/formingworlds/proteus:latest

Tags: - latest (on main branch) - <branch>-<sha> (commit-specific) - nightly-YYYYMMDD (daily builds)

Optimization: - BuildKit cache for faster rebuilds - Layer caching from previous builds - Multi-stage optimization potential

3. ci-pr-checks.yml (Fast Feedback)

Location: .github/workflows/ci-pr-checks.yml

Purpose: Fast PR validation using pre-built Docker image.

Triggers: - Pull requests to main - Push to main - Manual dispatch

Strategy: 1. Container: Runs inside ghcr.io/formingworlds/proteus:latest (or branch-specific tag) 2. Threshold check: Prevents coverage decreases vs main 3. Code Overlay: Overlays PR code onto container (excludes compiled modules) 4. Structure validation: tools/validate_test_structure.sh 5. Sequential testing: Unit → Smart rebuild → Smoke 6. Coverage coordination: Downloads nightly artifact for estimated total

Jobs:

unit-tests (Linux, Docker): Unit + smoke tests with coverage, coverage validation
macos-unit-tests (macOS): Unit tests only (no compiled binaries available)
lint: Code style checks with ruff
summary: Aggregates results from all jobs into a unified report

Linux job steps (in order):

Prevent threshold decreases — Fails if fail_under decreased vs main
Overlay PR code — rsync excludes SPIDER, SOCRATES, PETSc, AGNI
Validate test structure — Ensures tests/ mirrors src/proteus/
Run unit tests — pytest -m "unit and not skip" with coverage
Smart rebuild — Recompile SOCRATES/AGNI only if sources changed
Run smoke tests — pytest -m "smoke and not skip" (appends coverage)
Download nightly coverage — For estimated total calculation
Check staleness — Fails if nightly artifact >48h old
Validate coverage — Grace period of 0.3% for drops
Diff-cover — 80% coverage required on changed lines

Coverage coordination: - Fast gate threshold from [tool.proteus.coverage_fast] fail_under (see pyproject.toml) - Estimated total = union of PR lines + nightly integration lines - Grace period allows ≤0.3% drop with warning - Diff-cover enforces 80% on changed lines

See Test Categorization for marker details and Test Infrastructure for coverage thresholds.

Key Innovation - Smart Rebuild:

- name: Smart rebuild of physics modules
  run: |
    # Only rebuild if source files changed
    cd SPIDER
    make -q || make -j$(nproc)  # -q checks if build is up-to-date

Since the container already has compiled binaries: - If PR changes only Python files: No recompilation needed (~instant) - If PR changes Fortran/C files: Only changed files recompile (~seconds to minutes) - Full compilation avoided (~30-60 minutes saved)

4. ci-nightly.yml (Deep Validation)

Location: .github/workflows/ci-nightly.yml

Purpose: Comprehensive scientific validation and coverage baseline.

Triggers: - Primary: Dispatched by docker-build.yml after the 2am UTC image rebuild - Fallback: Cron at 03:00 UTC (skips if a dispatch run already happened in the last 4 hours) - Manual dispatch

Deduplication: A check-already-triggered guard job queries the GitHub API for recent workflow_dispatch runs. If the docker-build workflow already triggered the nightly, the 3am cron skips. On API failure, the cron proceeds as a safe default.

Environment: - Sets PROTEUS_CI_NIGHTLY=1 — enables additional smoke tests - Timeout: 240 minutes (4 hours) - Downloads ~200MB minimal data for smoke tests

Strategy: 1. Check if already triggered by docker-build (skip if so) 2. Use branch-specific Docker image 3. Overlay code (excludes compiled modules) 4. Download minimal data (spectral files, stellar spectra, lookup tables) 5. Configure Julia environment for Python integration 6. Run all test tiers sequentially 7. Upload aggregate coverage to Codecov 8. Generate coverage artifacts for PR coordination 9. Ratchet coverage threshold on success

Test sequence: 1. Unit tests — pytest -m "unit and not skip" with coverage 2. Smoke tests — pytest -m "smoke and not skip" (coverage appended) 3. Integration tests — pytest -m "integration and not slow" (coverage appended) 4. Slow tests — pytest -m slow (if time permits)

Codecov upload: - Uploads combined coverage.xml (unit + smoke + integration) under the nightly flag - Configured in codecov.yml with carryforward: true so data persists across PR evaluations - This is what the Codecov coverage badge in the README reflects

Artifacts uploaded: - nightly-coverage/coverage-integration-only.json — For PR estimated total - nightly-coverage/nightly-timestamp.txt — For staleness detection - nightly-coverage/coverage-by-type.json — Breakdown by test type

Coverage ratcheting: - Full threshold from [tool.coverage.report] fail_under (see pyproject.toml) - Auto-commits threshold increase on successful main runs

See Test Infrastructure for coverage coordination details.

Test Markers

Tests are categorized using pytest markers defined in pyproject.toml:

# Unit test (fast, mocked physics)
@pytest.mark.unit
def test_config_parsing():
    # Test Python logic without heavy dependencies
    pass

# Smoke test (quick real binary check)
@pytest.mark.smoke
def test_spider_single_timestep():
    # Run SPIDER for 1 timestep at low resolution
    # Ensures binary actually works
    pass

# Integration test (multi-module)
@pytest.mark.integration
def test_atmosphere_interior_coupling():
    # Test interaction between JANUS and SPIDER
    pass

# Slow test (full scientific validation)
@pytest.mark.slow
def test_earth_evolution_1gyr():
    # Run full 1 Gyr simulation
    # Validate against known results
    pass

Workflow Sequence

Nightly (Main Branch)

02:00 UTC: docker-build.yml
  ↓
  Rebuild Docker image
  ↓
  Trigger ci-nightly.yml via workflow_dispatch
  ↓
03:00 UTC: ci-nightly.yml cron (fallback)
  ↓
  check-already-triggered job
  ↓  (skips if dispatch run found in last 4h)
  Pull Docker image
  ↓
  Overlay code, download data (~200MB)
  ↓
  Run unit tests with coverage
  ↓
  Run smoke tests (PROTEUS_CI_NIGHTLY=1 enables extras)
  ↓
  Run integration tests
  ↓
  Run slow tests (if time permits)
  ↓
  Upload aggregate coverage to Codecov (nightly flag)
  ↓
  Upload nightly-coverage artifact
  ↓
  Ratchet threshold if coverage increased

Pull Request

PR opened/updated
  ↓
ci-pr-checks.yml (3 parallel jobs + summary)
  ↓
┌──────────────────────┬──────────────────┬────────────┐
│ unit-tests (Linux)   │ macos-unit-tests │ lint       │
│ Pull Docker image    │ Setup Python     │ ruff check │
│ Overlay PR code      │ pip install      │ ruff format│
│ Validate structure   │ Run unit tests   │            │
│ Unit tests + cov     │                  │            │
│ Smart rebuild        │                  │            │
│ Smoke tests          │                  │            │
│ Coverage validation  │                  │            │
│ Diff-cover (80%)     │                  │            │
└──────────┬───────────┴────────┬─────────┴──────┬─────┘
           └────────────────────┴────────────────┘
                          ↓
                    summary job
                    (unified report)
                          ↓
                  Fast feedback (~10-15 min)

Benefits

Speed Improvements

Before: Every PR compiles SOCRATES, PETSc, SPIDER, AGNI (~60 minutes)
After: Use pre-built image, smart rebuild only (~5-10 minutes for Python-only changes)
Savings: ~50 minutes per PR iteration

Resource Efficiency

Docker layer caching reduces rebuild time
Smart recompilation only builds changed files
Parallel job execution where possible

Scientific Rigor

Nightly comprehensive validation ensures correctness
PR checks provide fast feedback without compromising quality
Separation of fast unit tests from slow integration tests

Developer Experience

Fast PR checks (~10-15 min) enable rapid iteration
Clear test markers guide test writing
Comprehensive nightly validation catches regressions

Image Maintenance

When Docker Image Rebuilds

Nightly at 02:00 UTC (scheduled)
Changes to pyproject.toml (dependency updates)
Changes to environment.yml (conda dependencies)
Changes to Dockerfile (build process)
Changes to tools/get_*.sh (compilation scripts)

Image Size Management

Cleanup layers remove apt cache, Python cache
Multi-stage builds potential for further optimization
Current estimated size: ~2-3 GB (with compiled modules)

Cache Strategy

BuildKit cache stored in registry
Layer caching from previous builds
Fast incremental builds

Coverage Coordination

The two-tier coverage system coordinates between nightly and PR workflows:

Feature	Value	Description
Fast gate	`pyproject.toml`	PR threshold (unit + smoke)
Full gate	`pyproject.toml`	Nightly threshold (all tests)
Grace period	0.3%	PRs can merge with small drops
Staleness	48h	PR fails if nightly too old
Diff-cover	80%	Required on changed lines

How estimated total works: 1. PR runs unit + smoke → coverage-unit.json 2. Download nightly's coverage-integration-only.json 3. Compute union of covered lines 4. Compare against full threshold

See Test Infrastructure for threshold details.

Troubleshooting

Image Build Fails

Check GitHub Actions logs in docker-build.yml
Verify compilation scripts work locally
Test Dockerfile locally: docker build -t proteus-test .

Smart Rebuild Not Working

Verify make is installed in container
Check if Makefiles are copied correctly
Manual rebuild: Remove binaries and rebuild

Tests Fail in Container

Test locally with: docker run -it ghcr.io/formingworlds/proteus:latest bash
Verify environment variables are set
Check file permissions

Image Too Large

Review cleanup steps in Dockerfile
Consider multi-stage builds
Analyze layers: docker history ghcr.io/formingworlds/proteus:latest

Future Enhancements

Multi-architecture Support: Build for ARM64 (Apple Silicon)
Version Tagging: Semantic versioning for stable releases
Matrix Testing: Multiple Python versions (3.11, 3.12, 3.13)
Performance Profiling: Benchmark tests across versions
Artifact Caching: Cache FWL_DATA between runs

References

PROTEUS Documentation

Test Infrastructure — Coverage workflows, thresholds, troubleshooting
Test Categorization — Test markers, CI pipelines, fixtures
Test Building — Writing tests, prompts, best practices
AI-Assisted Development — Using AI for tests and code review