Skip to content

Inference Pipeline

FoehnCast keeps inference inside the application layer. The serving path resolves the live MLflow alias, fetches fresh forecasts for configured spots, rebuilds the shared engineered feature vector, returns per-spot quality predictions, and optionally exposes the Feast-backed online lookup without pulling training or operator concerns into the request path.

This page records the current serving contract that is validated in the local stack and in endpoint, dashboard, and cloud-runtime tests. It focuses on what the running app owns today and what stays optional or operator-controlled.

Scope

This page describes the current validated inference-path contract. It is not a roadmap. Future changes should be documented after they are chosen and implemented.

Inference Shape

flowchart LR REQ[Requested spots] --> RES[Resolve configured spots] RES --> WX[Fetch Open-Meteo forecast] WX --> ENG[Engineer shared features] REG[MLflow serving alias] --> MOD[Load served model] ENG --> MOD MOD --> PRE[/predict response] PRE --> RNK[Rank with rider profile and drive time] RNK --> OUT[/rank response] RES --> ONL[/features/online lookup] ONL --> DEMO[Online-features demo] DASH[Streamlit live demo] --> PRE DASH --> OUT PRE --> MON[Prediction monitoring hand-off] OUT --> MON

The important boundary is that inference stays request-focused:

  • the app resolves configured spots and fetches fresh forecast data on demand
  • the serving model is loaded from the registry alias that represents the live contract
  • prediction and ranking reuse the same forecast payload instead of forking into separate pipelines
  • the Feast lookup path stays optional and does not gate the main prediction surface
  • monitoring is triggered from the request path, but the operator metrics surface stays separate

Endpoint Responsibilities

Surface Main responsibility Must not become
/health expose app readiness plus the served alias and model version a deployment control plane
/spots return the configured set of supported spots a source of hidden business rules
/predict return per-spot forecast rows with continuous quality_index values a training, labeling, or promotion path
/rank score the same prediction payload for one rider profile a second prediction model or a dashboard layer
/features/online expose Feast-backed online feature rows for app-side integrations a required dependency for normal prediction requests
/features/online/demo give a lightweight HTML page for manual lookup checks the main rider product UI

The app also serves /metrics, but that route belongs to the monitoring hand-off described below rather than to the core rider-facing inference contract.

Model Resolution Boundary

Inference serves one registry alias at a time. The current contract is:

  • the registered model name is foehncast-quality
  • the default live alias is champion
  • FOEHNCAST_MLFLOW_SERVING_ALIAS can override the served alias when an operator needs to pin another registry view
  • /health returns status, model_alias, and model_version so a runtime check can confirm what is actually being served
  • prediction responses also include model_version so downstream consumers can tie a response back to the active registry version

This keeps serving aligned with the training registry contract without giving the inference service promotion or rollback authority.

Prediction Boundary

The prediction path is intentionally narrow:

  • requested spot IDs are resolved against the configured spot list, and unknown spots return 404 instead of silently falling back
  • live weather comes from the current Open-Meteo forecast pull for each requested spot
  • the request horizon is capped by inference.max_horizon_hours, which is currently 14
  • the same engineer_features step used by the feature path rebuilds the feature vector expected by the trained model
  • the served feature columns come from model.features in config.yaml
  • the app returns forecast rows as timestamps plus continuous quality_index values

That means the request path scores a trained model against fresh forecast features. It does not recompute the synthetic training labels, emit evaluation artifacts, or move registry aliases.

Ranking Boundary

/rank is not a second model. It reuses the prediction payload from /predict and then scores the candidate spots for the configured rider profile.

The current ranking contract is:

  • ranking weights come from config.yaml and are currently 0.6 for peak quality, 0.3 for ride-versus-drive ratio, and 0.1 for rideable duration
  • drive-time cost comes from the rider profile plus the OSRM routing lookup
  • session duration is derived from the forecast hours that clear the rideable threshold
  • the route returns ranked numeric rows such as quality_index, drive_minutes, session_hours, ride_drive_ratio, and score

This keeps ranking personal without hiding another model behind the API. The Streamlit demo can add rider-facing labels and summary cards on top of the same ranked data.

Online Feature Boundary

The online feature route is a separate integration surface layered on top of the same curated contract.

The current online-feature contract is:

  • the default Feast repo path is feature_repo/, with FOEHNCAST_FEAST_REPO_PATH as an override
  • a call without explicit feature names uses the foehncast_model_v1 feature service
  • a call with explicit feature names resolves them against the spot_forecast_features view unless the caller already supplies a fully qualified feature reference
  • the route returns row-shaped feature data instead of leaking Feast's columnar response shape
  • the route returns 503 when the Feast runtime dependency or configured repo is missing and 400 when the requested feature list is invalid

This path stays optional. Normal /predict and /rank requests do not depend on Feast being available.

Rider Demo Surfaces

FoehnCast keeps the rider-facing demo separate from operator dashboards.

The current demo surfaces are:

  • the Streamlit live demo, which loads live predictions, applies the ranking helper, and presents rider-facing cards and tables
  • the online-features demo page, which issues manual /features/online calls against the running app

The Streamlit helper rounds continuous quality scores into stable rider-facing labels from Unsafe through Perfect Storm, uses the configured forecast horizon to describe the current live window, and exposes the current model_version in the returned payload. That makes it a public-safe evaluation surface, not an operator dashboard.

Monitoring Hand-Off

Prediction routes emit monitoring signals, but the monitoring surface itself stays separate.

The current hand-off is:

  • /predict and /rank schedule background prediction-monitoring work after a successful response payload is built
  • the background path records scheduling and execution outcomes and emits prediction-drift metrics from retained prediction history
  • /metrics merges durable feature and training summaries, retained prediction-log metrics, hosted-sync metrics, and in-process prediction-monitoring counters

This keeps the inference service responsible for request-side facts while leaving dashboards, alert rules, and long-range operator review to the monitoring stack.

Hosted Serving Boundary

The same FastAPI app runs across the supported serving targets.

The current hosted contract is:

  • the local evaluator target serves the full app inside the Compose stack
  • the hosted full-stack target keeps the app online next to the other runtime services on one GCP host
  • the hosted inference target publishes the same FastAPI inference surface on Cloud Run without shipping Airflow, notebooks, docs tooling, or local emulators
  • cloud bootstrap and operator checks verify the live /health and /spots routes when the hosted inference path is enabled

That boundary matters because the app is the product and service surface. Grafana remains an operator surface, not the rider product UI.

Why This Structure Works

  • it keeps live requests narrow enough to verify through simple route and dashboard tests
  • it reuses the shared feature contract instead of inventing a serving-only schema
  • it ties responses back to a concrete registry version without giving the app promotion authority
  • it keeps the Feast lookup path useful for integration checks while leaving prediction and ranking available from the core app alone

See Architecture, Training Pipeline, Monitoring, and Cloud Mapping for the surrounding system boundaries.