Skip to content

Prediction Pipeline

The prediction pipeline is a cache-first architecture. Live Kronos inference is expensive (~50 ms cached, ~800 ms cold, single-concurrency due to RoPE cache); hourly/daily batch jobs generate all predictions in advance and the gateway simply serves the most recent row from Postgres.

sequenceDiagram
participant Cron as Railway Cron
participant Scripts as scripts/kronos-batch-predict.py
participant DB as Supabase Postgres
participant K as Kronos FastAPI<br/>(RTX 4060)
participant GW as MCP Gateway<br/>(Railway)
participant UI as prediction.datfxlabs.com
Cron->>Scripts: 55 * * * * (hourly)
Scripts->>DB: SELECT ohlcv_1h WHERE symbol IN (23 instruments)
Scripts->>DB: SELECT economic_calendar WHERE date >= now() - 30d
Scripts->>Scripts: EventEncoder.encode(...) → (T, 20)
Scripts->>K: POST /predict {ohlcv, events, pred_len=120}
K-->>Scripts: {p10[], p50[], p90[], samples[]}
Scripts->>DB: INSERT INTO ml_predictions (...)
UI->>GW: GET /showcase/ml-prediction/BTCUSDT
GW->>DB: SELECT latest from ml_predictions
DB-->>GW: row
GW-->>UI: {p10, p50, p90, generated_at}

Currently live (base Kronos, 10-channel placeholder events):

CronSchedulePurpose
kronos-batch-1h55 * * * *Hourly 1 h predictions, 23 instruments
kronos-batch-1d45 5 * * *Daily 1 d predictions
score-signals-1h5 * * * *Score predictions older than 1 h
score-signals-1d30 6 * * *Score predictions older than 1 d

Phase 0–6 additions (planned):

CronSchedulePurpose
chronos2-batch-1h55 * * * *Parallel Chronos-2 predictions
chronos2-batch-1d45 5 * * *Parallel Chronos-2 daily
kronos-rolling-finetune0 6 1 * *Monthly per-asset-class LoRA retrain
  • Lives on your local RTX 4060 at port 8200
  • Exposed to Railway/gateway via cloudflared tunnel (no public IP)
  • Semaphore pinned to 1 to avoid RoPE numerical drift under concurrency
  • Models loaded once at startup; event encoder + LoRA adapter loaded alongside

Batch script: scripts/kronos-batch-predict.py

Section titled “Batch script: scripts/kronos-batch-predict.py”

Responsibilities per run:

  1. Fetch OHLCV for each of 23 instruments from Supabase
  2. Fetch economic_calendar + cross-asset leader OHLCV (BTC / SPY / DXY / VIX)
  3. Build 20-channel event tensor via EventEncoder
  4. POST to Kronos FastAPI
  5. Insert {p10, p50, p90, samples, event_context, model_name} into ml_predictions
  • No auth (SHOWCASE_API_KEY bypass for showcase routes)
  • Reads latest row from ml_predictions filtered by symbol and optional model_name
  • 30-minute LRU cache upstream
  • Serves prediction.datfxlabs.com, the Finkit UI, and this docs site

After the prediction horizon elapses, score_signals.py evaluates each prediction:

  • Did p50 direction match realized direction?
  • Did actual close land inside p10–p90 envelope?
  • Per-instrument, per-horizon Sharpe and hit-rate stats written to signal_performance materialised view

This closed loop powers the Evaluation page and feeds training labels to the Phase 5 ensemble meta-learner.

Three reasons:

  1. Model concurrency = 1 — Kronos can’t safely serve multiple live requests anyway.
  2. Latency SLO — user-facing endpoints need <200 ms; cold inference is 4× that.
  3. Resilience — if the RTX 4060 is unreachable (power, network, tunnel restart), cached predictions keep serving until next cron.

The tradeoff: predictions are stale for up to 1 h (intraday) or 24 h (daily). Acceptable for the use case; real-time trading would need a different stack.