Skip to content

Training Pipeline

FoehnCast keeps training downstream from the curated feature store. The training path reads stored curated rows, rebuilds schema-derived features when needed, generates synthetic rideability labels, trains the configured regressor, writes evaluation evidence, and registers a versioned model in MLflow without pushing training concerns back into the feature pipeline.

This page records the current training design that is validated in the local stack and in regression tests. It focuses on what each stage owns today and what remains an explicit operator control rather than an automatic side effect.

Scope

This page describes the current validated training-path contract. It is not a roadmap. Future changes should be documented after they are chosen and implemented.

Pipeline Shape

flowchart LR CUR[Curated feature store asset] --> LAB[Label curated rows] REQ[Training-request asset] --> LAB LAB --> TRN[Train model] TRN --> EVA[Generate evaluation report] EVA --> REG[Register model version] REG --> ALS[Assign requested alias] OPS[Promote and rollback controls] --> ALS

The key point is that the training path stays explicit:

  • curated features arrive from the feature pipeline instead of being rebuilt from raw ingest
  • labeling owns the synthetic target definition
  • training owns model fitting and metric logging
  • evaluation owns reviewable reporting
  • registration owns versioning and alias assignment in MLflow
  • promotion and rollback stay separate operator controls even though they reuse the same registry aliases

Stage Responsibilities

Stage Main responsibility Must not become
Label turn curated feature rows into a synthetic quality_index target a replacement for feature engineering or a hidden scoring service
Train fit the configured model and log reproducible run metrics a place where registry promotion or serving logic leaks in
Evaluate write reviewable metrics and report artifacts for one run a second training stage or a silent model selector
Register create a versioned MLflow model and assign the requested alias a broad deployment control plane
Promote and rollback move explicit aliases between validated model versions retraining, relabeling, or feature regeneration

Label Boundary

Labeling is synthetic and physics-driven, not human-curated. The current label contract depends on curated wind and shoreline features that already exist in the stored dataset:

  • wind_speed_10m
  • wind_gusts_10m
  • wind_steadiness
  • gust_factor
  • shore_alignment

The label rules combine rider profile settings with configured wind bands from config.yaml.

The important design choices are:

  • dangerous wind and gust conditions are forced to the non-rideable bucket
  • the minimum rideable wind threshold depends on rider weight
  • high-quality windows depend on both wind range and quality constraints such as gust factor, shoreline fit, and steadiness
  • the output stays a stable 0 to 5 quality_index target that downstream training and evaluation can compare directly

This means labeling is part of the training contract, not part of the inference service. The app serves predictions from a trained model; it does not recompute the synthetic label rules on demand.

Training Boundary

Training reads stored feature rows by spot and dataset, then rebuilds schema-derived time, direction, and gust features before labeling. That compatibility step matters because the training path is expected to keep working even when stored datasets predate a later feature-schema expansion.

The current training contract is:

  • the input comes from stored curated features, not raw forecast payloads
  • derived feature rebuilding is a compatibility guard, not a substitute for the feature pipeline
  • the configured feature list and target column remain explicit in config.yaml
  • the current model family is tree-based, with random_forest and gradient_boosting as the supported algorithms
  • one MLflow run records parameters, regression metrics, class-bucket accuracy metrics, row counts, feature counts, and a feature-importance plot when the estimator exposes importances

Training should stay narrow. It fits a model and records one reproducible run. It should not decide traffic rollout or silently rewrite registry aliases beyond the requested registration stage.

Evaluation Boundary

Evaluation resumes the same MLflow run after training and turns its metrics into a reviewable artifact.

The current evaluation contract is:

  • regression metrics include mae, rmse, and r2
  • rounded class-bucket accuracy is logged alongside the regression metrics
  • the markdown evaluation report is written under airflow/reports/ as evaluation-<run_id>.md
  • the same report is logged back into MLflow as an artifact

This keeps evaluation visible outside the notebook path. Reviewers and operators can inspect a persisted markdown report instead of depending on an ad hoc interactive session.

Registration Boundary

Registration converts one logged MLflow run into a named registry version and assigns the requested alias.

The current registry contract is:

  • the registered model name is foehncast-quality
  • the validated pre-live alias is candidate
  • the live-serving alias is champion
  • training summaries persist run-level metrics, row counts, report paths, stage durations, and the registered version so the monitoring surface can expose the latest training state

Manual training runs default to the Candidate stage. Asset-triggered runs from the feature pipeline can request Production, which lets the asset flow produce a candidate-ready or live-ready registration path without changing the training code itself.

Airflow Hand-Off

The training DAG is scheduled from the feature pipeline's published training-request asset instead of a direct DAG-to-DAG trigger.

That hand-off matters because it keeps the orchestration boundary visible:

  • the feature DAG publishes curated-feature and training-request assets after persistence succeeds
  • the training DAG consumes the curated feature store and the training request
  • the training DAG emits MLflow training-run, evaluation-report, and model-registry assets
  • dataset and stage can still be overridden through DAG config when needed

This makes the Airflow Assets view reflect the real dependency graph between curated feature persistence, training, evaluation, and registration.

Alias Controls Outside The DAG

Promotion and rollback are explicit controls layered on top of the registry aliases, not hidden inside normal training.

The current operator controls are:

  • foehncast.training_pipeline.promote can move an explicit version or the current candidate alias to the production stage
  • foehncast.training_pipeline.rollback can restore the champion alias to an explicit previous version
  • the same alias contract is reused by the shared cloud operator workflows and by the serving path that loads the current live alias

That separation is deliberate. Training can succeed without immediately changing the live serving version, and rollback can happen without retraining.

Why This Structure Works

  • it keeps training downstream from the curated feature contract instead of duplicating feature logic
  • it preserves reviewable evidence through MLflow runs and markdown evaluation reports
  • it keeps candidate and champion semantics explicit in the registry rather than buried in deployment scripts
  • it makes automatic retraining visible in Airflow through asset hand-offs instead of opaque trigger chains

See Architecture, Feature Pipeline, and Monitoring for the surrounding system boundaries.