Grading Checklist¶

Quick reference: what we built and where to find it. Each section maps to 20% of the grade.

Architecture (20%)¶

Clean FTI split, feature store, model registry, containerized deployment.

What	Where
Feature / Training / Inference split	`src/foehncast/feature_pipeline/`, `training_pipeline/`, `inference_pipeline/`
Airflow DAGs with asset triggers (local)	`dags/feature_dag.py`, `training_dag.py`, `inference_dag.py`
Cloud Run jobs + Workflows (cloud)	`terraform/main.tf` — FTI cascade (6 h), drift detection (12 h)
Feature store (Feast)	`feature_repo/`
Model registry (MLflow)	Champion/candidate aliases in `training_pipeline/register.py`
6 container services	`containers/` (Airflow, MLflow, app, UI, monitoring, dev)
Compose with overlay pattern	`docker-compose.yml` + `objectstore.yml` / `gcp.yml`
Local + cloud deployment	`scripts/bootstrap-local.sh`, `terraform/main.tf`
Storage abstraction	`feature_pipeline/store.py` switches between S3 and BigQuery

Docs: Architecture

Everything runs without manual steps after bootstrap.

What	Where
CI (7 jobs: shell, lint, terraform, dvc, compose, test, docs)	`.github/workflows/ci.yml`
Auto image publishing	Cloud Build triggers (GCP-native, path-filtered)
Infrastructure-as-code	`terraform/main.tf` + `terraform.yml` workflow
Asset-triggered training (local)	Feature DAG → training-request asset → Training DAG
Cloud Workflows cascade (cloud)	Cloud Scheduler → feature → training → inference (scale-to-zero)
Asset-triggered inference (local)	Model registered → Inference DAG runs batch predictions
Pre-commit hooks (8)	`.pre-commit-config.yaml` (ruff, whitespace, YAML, etc.)
Bootstrap scripts	`scripts/bootstrap-local.sh`, `scripts/bootstrap-gcp.sh`
Runtime release + rollback	Cloud Build triggers + Cloud Run probes (automatic rollback)

Docs: Delivery Workflow

Same results on any machine.

What	Where
DVC pipeline (curate + train)	`dvc.yaml`, `dvc.lock`
Tracked outputs	`data/`, `reports/train_metrics.json`, `reports/feature_importance.png`
Locked dependencies	`pyproject.toml` + `uv.lock`
Pinned containers	`python:3.12-slim`, multi-stage, `uv sync --frozen`
Config-driven (no magic numbers)	`config.yaml` has spots, model params, thresholds
Data lineage in MLflow	SHA-256 hash + git commit logged per run
Local smoke test = CI smoke test	`make smoke-local-evaluator`

Docs: Feature Pipeline, Training Pipeline

Static analysis, tests, and consistent structure.

What	Where
Linting + formatting	`ruff` via pre-commit and `make lint`
Unit tests across the pipeline modules	`tests/`
CI enforces everything	Lint, test, shell checks, terraform validate on every PR
Type annotations	`from __future__ import annotations` throughout
Clean package structure	Domain subpackages + shared utilities (`_bigquery.py`, etc.)
Shell validation	ShellCheck-style checks in CI

Docs: Repository

Prometheus metrics, drift detection, alerting, and visualization.

What	Where
Custom Prometheus exporters	`monitoring/pipeline_prometheus.py`, `prediction_prometheus.py`
Combined `/metrics` endpoint	Feature + training + prediction + drift metrics
Drift detection (Evidently)	`monitoring/drift.py` — statistical tests per column
Drift Cloud Run job (12 h schedule)	`terraform/main.tf` — runs `drift.py` on Cloud Run
Hindcast validation	`monitoring/hindcast.py` — predicted vs. observed
Streamlit charts (Altair + PromQL)	`ui/app.py` — system health, drift, pipeline panels
9 alert rules	`prometheus_config/alerting_rules.yml`
Prediction event log	`.state/monitoring/prediction-events.jsonl` (local), BigQuery (cloud)
Scrape config in version control	`prometheus_config/prometheus.yml`

Docs: Monitoring