Prediction Pipeline

The prediction pipeline is a cache-first architecture. Live Kronos inference is expensive (~50 ms cached, ~800 ms cold, single-concurrency due to RoPE cache); hourly/daily batch jobs generate all predictions in advance and the gateway simply serves the most recent row from Postgres.

Flow diagram

sequenceDiagram
    participant Cron as Railway Cron
    participant Scripts as scripts/kronos-batch-predict.py
    participant DB as Supabase Postgres
    participant K as Kronos FastAPI<br/>(RTX 4060)
    participant GW as MCP Gateway<br/>(Railway)
    participant UI as prediction.datfxlabs.com

    Cron->>Scripts: 55 * * * * (hourly)
    Scripts->>DB: SELECT ohlcv_1h WHERE symbol IN (23 instruments)
    Scripts->>DB: SELECT economic_calendar WHERE date >= now() - 30d
    Scripts->>Scripts: EventEncoder.encode(...) → (T, 20)
    Scripts->>K: POST /predict {ohlcv, events, pred_len=120}
    K-->>Scripts: {p10[], p50[], p90[], samples[]}
    Scripts->>DB: INSERT INTO ml_predictions (...)
    UI->>GW: GET /showcase/ml-prediction/BTCUSDT
    GW->>DB: SELECT latest from ml_predictions
    DB-->>GW: row
    GW-->>UI: {p10, p50, p90, generated_at}

Cron schedule

Currently live (base Kronos, 10-channel placeholder events):

Cron	Schedule	Purpose
`kronos-batch-1h`	`55 * * * *`	Hourly 1 h predictions, 23 instruments
`kronos-batch-1d`	`45 5 * * *`	Daily 1 d predictions
`score-signals-1h`	`5 * * * *`	Score predictions older than 1 h
`score-signals-1d`	`30 6 * * *`	Score predictions older than 1 d

Phase 0–6 additions (planned):

Cron	Schedule	Purpose
`chronos2-batch-1h`	`55 * * * *`	Parallel Chronos-2 predictions
`chronos2-batch-1d`	`45 5 * * *`	Parallel Chronos-2 daily
`kronos-rolling-finetune`	`0 6 1 * *`	Monthly per-asset-class LoRA retrain

Components

Kronos FastAPI service

Lives on your local RTX 4060 at port 8200
Exposed to Railway/gateway via cloudflared tunnel (no public IP)
Semaphore pinned to 1 to avoid RoPE numerical drift under concurrency
Models loaded once at startup; event encoder + LoRA adapter loaded alongside

Batch script: `scripts/kronos-batch-predict.py`

Responsibilities per run:

Fetch OHLCV for each of 23 instruments from Supabase
Fetch economic_calendar + cross-asset leader OHLCV (BTC / SPY / DXY / VIX)
Build 20-channel event tensor via EventEncoder
POST to Kronos FastAPI
Insert {p10, p50, p90, samples, event_context, model_name} into ml_predictions

Gateway: `/showcase/ml-prediction/:symbol`

No auth (SHOWCASE_API_KEY bypass for showcase routes)
Reads latest row from ml_predictions filtered by symbol and optional model_name
30-minute LRU cache upstream
Serves prediction.datfxlabs.com, the Finkit UI, and this docs site

Signal scoring pipeline

After the prediction horizon elapses, score_signals.py evaluates each prediction:

Did p50 direction match realized direction?
Did actual close land inside p10–p90 envelope?
Per-instrument, per-horizon Sharpe and hit-rate stats written to signal_performance materialised view

This closed loop powers the Evaluation page and feeds training labels to the Phase 5 ensemble meta-learner.

Why cache-first?

Three reasons:

Model concurrency = 1 — Kronos can’t safely serve multiple live requests anyway.
Latency SLO — user-facing endpoints need <200 ms; cold inference is 4× that.
Resilience — if the RTX 4060 is unreachable (power, network, tunnel restart), cached predictions keep serving until next cron.

The tradeoff: predictions are stale for up to 1 h (intraday) or 24 h (daily). Acceptable for the use case; real-time trading would need a different stack.