Docker-Based CI/CD Architecture for PROTEUS
What This Document Is For
New to PROTEUS CI? This document explains how our Docker-based testing infrastructure works. Docker containers provide a consistent environment with pre-compiled physics modules, making CI runs fast and reproducible.
Key concept: Instead of compiling SOCRATES, AGNI, PETSc, and SPIDER on every CI run (~60 min), we use a pre-built Docker image (~5 min startup).
For test markers and categories, see Test Categorization. For coverage workflows, see Test Infrastructure. For writing tests, see Test Building.
Overview
This architecture solves slow compilation times by using a pre-built Docker image containing the full PROTEUS environment with compiled physics modules. The image is built on demand and used by all CI/CD workflows.
Architecture Components
1. Dockerfile
Location: /Dockerfile
Purpose: Define the pre-built environment with all dependencies and compiled physics modules.
Key Features:
- Base: Python 3.12 on Debian Bookworm (slim)
- System dependencies: gfortran, make, cmake, git, NetCDF libraries
- Julia installation via official installer
- Compiles all physics modules:
- SOCRATES (radiative transfer)
- PETSc (numerical computing)
- SPIDER (interior evolution)
- AGNI (radiative-convective atmosphere)
- Installs Python packages from pyproject.toml
- Optimized for size with cache cleanup
Environment Variables:
FWL_DATA=/opt/proteus/fwl_data
RAD_DIR=/opt/proteus/socrates
AGNI_DIR=/opt/proteus/AGNI
PETSC_DIR=/opt/proteus/petsc
PETSC_ARCH=arch-linux-c-opt
PROTEUS_DIR=/opt/proteus
2. docker-build.yml (The Updater)
Location: .github/workflows/docker-build.yml
Purpose: Build and push the Docker image to GitHub Container Registry.
Triggers:
- Schedule: Nightly at 02:00 UTC
- Push to main when dependencies change:
- pyproject.toml
- environment.yml
- Dockerfile
- tools/get_*.sh scripts
Output: ghcr.io/formingworlds/proteus:latest
Tags:
- latest (on main branch)
- <branch>-<sha> (commit-specific)
- nightly-YYYYMMDD (daily builds)
Optimization: - BuildKit cache for faster rebuilds - Layer caching from previous builds - Multi-stage optimization potential
3. ci-pr-checks.yml (Fast Feedback)
Location: .github/workflows/ci-pr-checks.yml
Purpose: Fast PR validation using pre-built Docker image.
Triggers:
- Pull requests to main
- Push to main
- Manual dispatch
Strategy:
1. Container: Runs inside ghcr.io/formingworlds/proteus:latest (or branch-specific tag)
2. Threshold check: Prevents coverage decreases vs main
3. Code Overlay: Overlays PR code onto container (excludes compiled modules)
4. Structure validation: tools/validate_test_structure.sh
5. Sequential testing: Unit β Smart rebuild β Smoke
6. Coverage coordination: Downloads nightly artifact for estimated total
Jobs:
- unit-tests (Linux, Docker): Unit + smoke tests with coverage, coverage validation
- macos-unit-tests (macOS): Unit tests only (no compiled binaries available)
- lint: Code style checks with ruff
- summary: Aggregates results from all jobs into a unified report
Linux job steps (in order):
- Prevent threshold decreases β Fails if
fail_underdecreased vs main - Overlay PR code β
rsyncexcludes SPIDER, SOCRATES, PETSc, AGNI - Validate test structure β Ensures
tests/mirrorssrc/proteus/ - Run unit tests β
pytest -m "unit and not skip"with coverage - Smart rebuild β Recompile SOCRATES/AGNI only if sources changed
- Run smoke tests β
pytest -m "smoke and not skip"(appends coverage) - Download nightly coverage β For estimated total calculation
- Check staleness β Fails if nightly artifact >48h old
- Validate coverage β Grace period of 0.3% for drops
- Diff-cover β 80% coverage required on changed lines
Coverage coordination:
- Fast gate threshold from [tool.proteus.coverage_fast] fail_under (see pyproject.toml)
- Estimated total = union of PR lines + nightly integration lines
- Grace period allows β€0.3% drop with warning
- Diff-cover enforces 80% on changed lines
See Test Categorization for marker details and Test Infrastructure for coverage thresholds.
Key Innovation - Smart Rebuild:
- name: Smart rebuild of physics modules
run: |
# Only rebuild if source files changed
cd SPIDER
make -q || make -j$(nproc) # -q checks if build is up-to-date
Since the container already has compiled binaries: - If PR changes only Python files: No recompilation needed (~instant) - If PR changes Fortran/C files: Only changed files recompile (~seconds to minutes) - Full compilation avoided (~30-60 minutes saved)
4. ci-nightly.yml (Deep Validation)
Location: .github/workflows/ci-nightly.yml
Purpose: Comprehensive scientific validation and coverage baseline.
Triggers:
- Primary: Dispatched by docker-build.yml after the 2am UTC image rebuild
- Fallback: Cron at 03:00 UTC (skips if a dispatch run already happened in the last 4 hours)
- Manual dispatch
Deduplication: A check-already-triggered guard job queries the GitHub API for recent workflow_dispatch runs. If the docker-build workflow already triggered the nightly, the 3am cron skips. On API failure, the cron proceeds as a safe default.
Environment:
- Sets PROTEUS_CI_NIGHTLY=1 β enables additional smoke tests
- Timeout: 240 minutes (4 hours)
- Downloads ~200MB minimal data for smoke tests
Strategy: 1. Check if already triggered by docker-build (skip if so) 2. Use branch-specific Docker image 3. Overlay code (excludes compiled modules) 4. Download minimal data (spectral files, stellar spectra, lookup tables) 5. Configure Julia environment for Python integration 6. Run all test tiers sequentially 7. Upload aggregate coverage to Codecov 8. Generate coverage artifacts for PR coordination 9. Ratchet coverage threshold on success
Test sequence:
1. Unit tests β pytest -m "unit and not skip" with coverage
2. Smoke tests β pytest -m "smoke and not skip" (coverage appended)
3. Integration tests β pytest -m "integration and not slow" (coverage appended)
4. Slow tests β pytest -m slow (if time permits)
Codecov upload:
- Uploads combined coverage.xml (unit + smoke + integration) under the nightly flag
- Configured in codecov.yml with carryforward: true so data persists across PR evaluations
- This is what the Codecov coverage badge in the README reflects
Artifacts uploaded:
- nightly-coverage/coverage-integration-only.json β For PR estimated total
- nightly-coverage/nightly-timestamp.txt β For staleness detection
- nightly-coverage/coverage-by-type.json β Breakdown by test type
Coverage ratcheting:
- Full threshold from [tool.coverage.report] fail_under (see pyproject.toml)
- Auto-commits threshold increase on successful main runs
See Test Infrastructure for coverage coordination details.
Test Markers
Tests are categorized using pytest markers defined in pyproject.toml:
# Unit test (fast, mocked physics)
@pytest.mark.unit
def test_config_parsing():
# Test Python logic without heavy dependencies
pass
# Smoke test (quick real binary check)
@pytest.mark.smoke
def test_spider_single_timestep():
# Run SPIDER for 1 timestep at low resolution
# Ensures binary actually works
pass
# Integration test (multi-module)
@pytest.mark.integration
def test_atmosphere_interior_coupling():
# Test interaction between JANUS and SPIDER
pass
# Slow test (full scientific validation)
@pytest.mark.slow
def test_earth_evolution_1gyr():
# Run full 1 Gyr simulation
# Validate against known results
pass
Workflow Sequence
Nightly (Main Branch)
02:00 UTC: docker-build.yml
β
Rebuild Docker image
β
Trigger ci-nightly.yml via workflow_dispatch
β
03:00 UTC: ci-nightly.yml cron (fallback)
β
check-already-triggered job
β (skips if dispatch run found in last 4h)
Pull Docker image
β
Overlay code, download data (~200MB)
β
Run unit tests with coverage
β
Run smoke tests (PROTEUS_CI_NIGHTLY=1 enables extras)
β
Run integration tests
β
Run slow tests (if time permits)
β
Upload aggregate coverage to Codecov (nightly flag)
β
Upload nightly-coverage artifact
β
Ratchet threshold if coverage increased
Pull Request
PR opened/updated
β
ci-pr-checks.yml (3 parallel jobs + summary)
β
ββββββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββ
β unit-tests (Linux) β macos-unit-tests β lint β
β Pull Docker image β Setup Python β ruff check β
β Overlay PR code β pip install β ruff formatβ
β Validate structure β Run unit tests β β
β Unit tests + cov β β β
β Smart rebuild β β β
β Smoke tests β β β
β Coverage validation β β β
β Diff-cover (80%) β β β
ββββββββββββ¬ββββββββββββ΄βββββββββ¬ββββββββββ΄βββββββ¬ββββββ
ββββββββββββββββββββββ΄βββββββββββββββββ
β
summary job
(unified report)
β
Fast feedback (~10-15 min)
Benefits
Speed Improvements
- Before: Every PR compiles SOCRATES, PETSc, SPIDER, AGNI (~60 minutes)
- After: Use pre-built image, smart rebuild only (~5-10 minutes for Python-only changes)
- Savings: ~50 minutes per PR iteration
Resource Efficiency
- Docker layer caching reduces rebuild time
- Smart recompilation only builds changed files
- Parallel job execution where possible
Scientific Rigor
- Nightly comprehensive validation ensures correctness
- PR checks provide fast feedback without compromising quality
- Separation of fast unit tests from slow integration tests
Developer Experience
- Fast PR checks (~10-15 min) enable rapid iteration
- Clear test markers guide test writing
- Comprehensive nightly validation catches regressions
Image Maintenance
When Docker Image Rebuilds
- Nightly at 02:00 UTC (scheduled)
- Changes to
pyproject.toml(dependency updates) - Changes to
environment.yml(conda dependencies) - Changes to
Dockerfile(build process) - Changes to
tools/get_*.sh(compilation scripts)
Image Size Management
- Cleanup layers remove apt cache, Python cache
- Multi-stage builds potential for further optimization
- Current estimated size: ~2-3 GB (with compiled modules)
Cache Strategy
- BuildKit cache stored in registry
- Layer caching from previous builds
- Fast incremental builds
Coverage Coordination
The two-tier coverage system coordinates between nightly and PR workflows:
| Feature | Value | Description |
|---|---|---|
| Fast gate | pyproject.toml |
PR threshold (unit + smoke) |
| Full gate | pyproject.toml |
Nightly threshold (all tests) |
| Grace period | 0.3% | PRs can merge with small drops |
| Staleness | 48h | PR fails if nightly too old |
| Diff-cover | 80% | Required on changed lines |
How estimated total works:
1. PR runs unit + smoke β coverage-unit.json
2. Download nightly's coverage-integration-only.json
3. Compute union of covered lines
4. Compare against full threshold
See Test Infrastructure for threshold details.
Troubleshooting
Image Build Fails
- Check GitHub Actions logs in
docker-build.yml - Verify compilation scripts work locally
- Test Dockerfile locally:
docker build -t proteus-test .
Smart Rebuild Not Working
- Verify make is installed in container
- Check if Makefiles are copied correctly
- Manual rebuild: Remove binaries and rebuild
Tests Fail in Container
- Test locally with:
docker run -it ghcr.io/formingworlds/proteus:latest bash - Verify environment variables are set
- Check file permissions
Image Too Large
- Review cleanup steps in Dockerfile
- Consider multi-stage builds
- Analyze layers:
docker history ghcr.io/formingworlds/proteus:latest
Future Enhancements
- Multi-architecture Support: Build for ARM64 (Apple Silicon)
- Version Tagging: Semantic versioning for stable releases
- Matrix Testing: Multiple Python versions (3.11, 3.12, 3.13)
- Performance Profiling: Benchmark tests across versions
- Artifact Caching: Cache FWL_DATA between runs
References
PROTEUS Documentation
- Test Infrastructure β Coverage workflows, thresholds, troubleshooting
- Test Categorization β Test markers, CI pipelines, fixtures
- Test Building β Writing tests, prompts, best practices
- AI-Assisted Development β Using AI for tests and code review