Training Pipeline¶
FoehnCast keeps training downstream from the curated feature store. The training path reads stored curated rows, rebuilds schema-derived features when needed, generates synthetic rideability labels, trains the configured regressor, writes evaluation evidence, and registers a versioned model in MLflow without pushing training concerns back into the feature pipeline.
This page records the current training design that is validated in the local stack and in regression tests. It focuses on what each stage owns today and what remains an explicit operator control rather than an automatic side effect.
Scope
This page describes the current validated training-path contract. It is not a roadmap. Future changes should be documented after they are chosen and implemented.
Pipeline Shape¶
The key point is that the training path stays explicit:
- curated features arrive from the feature pipeline instead of being rebuilt from raw ingest
- labeling owns the synthetic target definition
- training owns model fitting and metric logging
- evaluation owns reviewable reporting
- registration owns versioning and alias assignment in MLflow
- promotion and rollback stay separate operator controls even though they reuse the same registry aliases
Stage Responsibilities¶
| Stage | Main responsibility | Must not become |
|---|---|---|
| Label | turn curated feature rows into a synthetic quality_index target |
a replacement for feature engineering or a hidden scoring service |
| Train | fit the configured model and log reproducible run metrics | a place where registry promotion or serving logic leaks in |
| Evaluate | write reviewable metrics and report artifacts for one run | a second training stage or a silent model selector |
| Register | create a versioned MLflow model and assign the requested alias | a broad deployment control plane |
| Promote and rollback | move explicit aliases between validated model versions | retraining, relabeling, or feature regeneration |
Label Boundary¶
Labeling is synthetic and physics-driven, not human-curated. The current label contract depends on curated wind and shoreline features that already exist in the stored dataset:
wind_speed_10mwind_gusts_10mwind_steadinessgust_factorshore_alignment
The label rules combine rider profile settings with configured wind bands from config.yaml.
The important design choices are:
- dangerous wind and gust conditions are forced to the non-rideable bucket
- the minimum rideable wind threshold depends on rider weight
- high-quality windows depend on both wind range and quality constraints such as gust factor, shoreline fit, and steadiness
- the output stays a stable
0to5quality_indextarget that downstream training and evaluation can compare directly
This means labeling is part of the training contract, not part of the inference service. The app serves predictions from a trained model; it does not recompute the synthetic label rules on demand.
Training Boundary¶
Training reads stored feature rows by spot and dataset, then rebuilds schema-derived time, direction, and gust features before labeling. That compatibility step matters because the training path is expected to keep working even when stored datasets predate a later feature-schema expansion.
The current training contract is:
- the input comes from stored curated features, not raw forecast payloads
- derived feature rebuilding is a compatibility guard, not a substitute for the feature pipeline
- the configured feature list and target column remain explicit in
config.yaml - the current model family is tree-based, with
random_forestandgradient_boostingas the supported algorithms - one MLflow run records parameters, regression metrics, class-bucket accuracy metrics, row counts, feature counts, and a feature-importance plot when the estimator exposes importances
Training should stay narrow. It fits a model and records one reproducible run. It should not decide traffic rollout or silently rewrite registry aliases beyond the requested registration stage.
Evaluation Boundary¶
Evaluation resumes the same MLflow run after training and turns its metrics into a reviewable artifact.
The current evaluation contract is:
- regression metrics include
mae,rmse, andr2 - rounded class-bucket accuracy is logged alongside the regression metrics
- the markdown evaluation report is written under
airflow/reports/asevaluation-<run_id>.md - the same report is logged back into MLflow as an artifact
This keeps evaluation visible outside the notebook path. Reviewers and operators can inspect a persisted markdown report instead of depending on an ad hoc interactive session.
Registration Boundary¶
Registration converts one logged MLflow run into a named registry version and assigns the requested alias.
The current registry contract is:
- the registered model name is
foehncast-quality - the validated pre-live alias is
candidate - the live-serving alias is
champion - training summaries persist run-level metrics, row counts, report paths, stage durations, and the registered version so the monitoring surface can expose the latest training state
Manual training runs default to the Candidate stage. Asset-triggered runs from the feature pipeline can request Production, which lets the asset flow produce a candidate-ready or live-ready registration path without changing the training code itself.
Airflow Hand-Off¶
The training DAG is scheduled from the feature pipeline's published training-request asset instead of a direct DAG-to-DAG trigger.
That hand-off matters because it keeps the orchestration boundary visible:
- the feature DAG publishes curated-feature and training-request assets after persistence succeeds
- the training DAG consumes the curated feature store and the training request
- the training DAG emits MLflow training-run, evaluation-report, and model-registry assets
- dataset and stage can still be overridden through DAG config when needed
This makes the Airflow Assets view reflect the real dependency graph between curated feature persistence, training, evaluation, and registration.
Alias Controls Outside The DAG¶
Promotion and rollback are explicit controls layered on top of the registry aliases, not hidden inside normal training.
The current operator controls are:
foehncast.training_pipeline.promotecan move an explicit version or the currentcandidatealias to the production stagefoehncast.training_pipeline.rollbackcan restore thechampionalias to an explicit previous version- the same alias contract is reused by the shared cloud operator workflows and by the serving path that loads the current live alias
That separation is deliberate. Training can succeed without immediately changing the live serving version, and rollback can happen without retraining.
Why This Structure Works¶
- it keeps training downstream from the curated feature contract instead of duplicating feature logic
- it preserves reviewable evidence through MLflow runs and markdown evaluation reports
- it keeps candidate and champion semantics explicit in the registry rather than buried in deployment scripts
- it makes automatic retraining visible in Airflow through asset hand-offs instead of opaque trigger chains
See Architecture, Feature Pipeline, and Monitoring for the surrounding system boundaries.