Configuration and Contracts¶
FoehnCast keeps workload configuration, runtime wiring, infrastructure inputs, and generated runtime state separate on purpose. This page describes that current contract so readers do not have to reconstruct it from config.yaml, src/foehncast/config.py, the repository notes, and the operator docs.
The goal is not to document every environment variable in isolation. The goal is to make the ownership boundary explicit: what the package owns, what the runtime injects, and what infrastructure or operator tooling should keep outside the package config.
Scope
This page describes the current validated configuration boundary. It is not a proposal for a future settings system. New settings should follow these ownership rules unless the architecture changes first.
Contract In One View¶
The important rule is that the package should own workload semantics, while runtime and infrastructure layers own deployment-specific wiring.
Ownership Boundary¶
| Surface | Owns | Current examples | Must not become |
|---|---|---|---|
config.yaml |
workload defaults and app-facing contracts | rider profile, spot list, API source settings, validation rules, model features, labeling bands, MLflow names, inference weights, monitoring thresholds | a dump of project IDs, service names, bind hosts, or deployment topology |
.env and environment variables |
concrete runtime wiring for one local or hosted instance | STORAGE_BACKEND, STORAGE_S3_BUCKET, STORAGE_BIGQUERY_*, MLFLOW_TRACKING_URI, bind hosts, Feast source selection |
the source of truth for rider, spot, or model semantics |
terraform/terraform.tfvars and GitHub delivery variables |
infrastructure desired state and hosted rollout inputs | regions, buckets, service names, machine shape, OIDC, remote Terraform state, Cloud Run and hosted-host toggles | the place where model features, ranking weights, or validation rules live |
feature_repo/feature_store*.yaml |
checked-in Feast reference configuration | local reference config and cloud example config | a replacement for the base application config |
.state/feast/feature_store.runtime.yaml |
rendered runtime Feast binding | active repo path, registry path, online-store binding, offline source target | a hand-maintained checked-in config file |
.state/monitoring/*.jsonl and airflow/reports/*.json |
retained local monitoring and rendered operator evidence | prediction-event history and latest pipeline summary contracts | app-facing workload configuration |
This split keeps the workload code smaller and clearer. The package does not need to own deployment metadata it only consumes after runtime wiring resolves it.
What Stays In config.yaml¶
The checked-in YAML owns the stable workload and product contract.
| Section | What it controls today |
|---|---|
rider |
baseline rider profile and home location used by ranking |
api |
upstream weather and routing source settings |
spots |
the fixed spot list and shore metadata |
storage |
the default curated-storage mode, currently s3 or bigquery |
warehouse |
retained warehouse contracts for curated features and prediction events |
validation |
required columns, completeness rules, and accepted numeric ranges |
model |
algorithm choice, feature sets, target field, split ratio, and seed |
labeling |
the synthetic quality-band rules and danger thresholds |
mlflow |
experiment name, model name, and alias naming |
inference |
live horizon and ranking weights |
monitoring |
drift threshold, evaluation window, and local retention knobs |
This is why the current training and inference pages can point back to config.yaml for feature lists, ranking weights, and label semantics without treating those values as runtime secrets.
What Runtime Wiring Resolves¶
src/foehncast/config.py keeps the YAML and the runtime wiring separate instead of mutating one into the other.
The current resolution rules are:
FOEHNCAST_CONFIG_PATHcan point the loader at a different YAML file when needed- the loader caches the checked-in YAML, but runtime helpers resolve environment overrides when the caller asks for storage or MLflow settings
- storage wiring resolves from environment first, then from legacy YAML runtime fields if they still exist, then from local-safe defaults
- MLflow experiment and model naming stay in YAML, while
MLFLOW_TRACKING_URIremains runtime wiring - the resolved storage config also exposes explicit warehouse contracts for curated features and prediction events
The test contract in tests/test_config.py already checks the most important behaviors:
- environment overrides apply after the initial YAML load
- runtime resolution does not mutate the cached YAML values
- legacy runtime fields still resolve when present
- default and custom warehouse contracts stay explicit instead of being inferred indirectly
That means the package does not need a second handwritten runtime config file just to switch from local to hosted wiring.
Current Runtime Wiring Examples¶
The checked-in .env.example shows the kind of values that belong in runtime wiring.
| Runtime surface | Current examples |
|---|---|
| Curated storage binding | STORAGE_BACKEND, STORAGE_S3_BUCKET, STORAGE_S3_ENDPOINT, STORAGE_BIGQUERY_PROJECT_ID, STORAGE_BIGQUERY_DATASET, STORAGE_BIGQUERY_TABLE |
| MLflow connection | MLFLOW_TRACKING_URI, MLFLOW_ARTIFACT_DESTINATION |
| Local service exposure | APP_BIND_HOST, AIRFLOW_BIND_HOST, PROMETHEUS_PORT, GRAFANA_PORT |
| Monitoring history path | FOEHNCAST_PREDICTION_EVENT_LOG_PATH |
| Feast runtime binding | FOEHNCAST_FEAST_SOURCE, FOEHNCAST_FEAST_REPO_PATH, FOEHNCAST_FEAST_CONFIG_PATH, FOEHNCAST_FEAST_BIGQUERY_*, FOEHNCAST_FEAST_DATASTORE_* |
These values describe one concrete runtime instance. They should stay overridable because the local evaluator, hosted full-stack target, and hosted inference target do not all bind to the same services.
Storage And Warehouse Contract¶
The storage boundary is intentionally narrow.
The current curated-storage contract is:
s3is the local MinIO-backed baselinebigqueryis the hosted analytical baseline- the older file-backed curated-store compatibility path is no longer part of the runtime contract
The runtime layer also keeps the warehouse contracts explicit instead of burying them in ad hoc SQL or monitoring code:
- curated features default to the
foehncast.forecast_featurestable contract with day partitioning onforecast_time - prediction events default to the
foehncast_monitoring.prediction_eventstable contract with day partitioning onprediction_timestamp - both contracts keep explicit clustering and retention settings
This matters because retained monitoring facts belong in retained event history and warehouse tables, not in restart-sensitive request counters or implicit table names.
Feast And Monitoring Runtime State¶
Two generated state areas are part of the current contract even though they are not workload config.
| Generated surface | Why it exists |
|---|---|
.state/feast/feature_store.runtime.yaml |
binds the running environment to the checked-in Feast repo without forcing operators to hand-edit a second runtime YAML |
.state/monitoring/prediction-log.jsonl |
bounded local working set for request-side drift checks |
.state/monitoring/prediction-events.jsonl |
retained local prediction-event history contract that can be redirected to shared storage |
airflow/reports/feature-pipeline-*-latest.json |
persisted pipeline summary that the app republishes through /metrics for Prometheus and Grafana |
These files are runtime artifacts. They are inspectable and useful, but they are not part of the checked-in workload contract and should not be promoted into config.yaml.
What Stays Out Of Package Config¶
The package config should not absorb deployment topology or operator rollout state.
Keep these outside config.yaml:
- concrete GCP project ids, bucket names, Cloud Run service names, and hosted machine shapes
- GitHub OIDC and remote Terraform state configuration
- public-port exposure decisions for hosted admin surfaces
- credentials and key-file paths
- per-environment bind hosts, ports, and service URLs
Those values belong in runtime env, Terraform inputs, or GitHub delivery variables because they describe where the system runs, not what the workload means.
Why This Split Works¶
- training and inference can share one workload contract without hard-coding deployment details
- local and hosted runtimes can switch storage and service bindings without forking the package config
- Feast stays downstream from the curated feature contract instead of becoming a shadow settings system
- operator state stays inspectable under
.state/andairflow/reports/without pretending to be workload source data
See Repository, Feature Pipeline, Inference Pipeline, Local Evaluator, and Cloud Mapping for the surrounding runtime and deployment boundaries.