Future Improvements
Ideas evaluated during brainstorming but deliberately not in the current 5-week plan. Logged here so they don’t get lost.
Roadmap beyond Phase 6
Section titled “Roadmap beyond Phase 6”timeline title Kronos roadmap · beyond 2026-04 Phase 1 (current) : Event-conditioned Kronos : Chronos-2 ensemble : Rolling fine-tune Phase 2 (Q3 2026) : Cross-asset token model : Shared BSQ + group attention : iTransformer-style variate attention Phase 3 (Q4 2026) : Multi-modal fusion : Price + macro + news tokens : Two-tower + cross-attention Phase 4 (2027) : Regime-specific adapters : Per-regime LoRA swap at inference : Meta-learner picks adapterEvaluated, deferred, or rejected
Section titled “Evaluated, deferred, or rejected”Cross-asset token model — DEFERRED to Phase 2
Section titled “Cross-asset token model — DEFERRED to Phase 2”Pass multiple assets into a shared tokenizer simultaneously. Each asset tokenized independently, then cross-attention across assets. Moirai-2 does this natively but its CC-BY-NC licence blocks commercial use. Chronos-2’s group attention is the pragmatic substitute.
For now, Phase 1 approximates this with 4 cross-asset leader channels. True cross-sectional attention (all 23 instruments attending to each other) is a bigger architecture lift.
Multi-modal fusion — DEFERRED to Phase 3
Section titled “Multi-modal fusion — DEFERRED to Phase 3”Combine:
- Price token stream (Kronos / Chronos-2)
- Macro indicator stream (FRED series embedded)
- News text stream (FinBERT embeddings)
Research gap: no production model fuses all three natively. State-of-art is two-tower + fusion (e.g. FinMem, FinTral). Engineering investment: 6–12 months. Revisit when Phase 1 baseline is solid and we have a reason to believe each modality adds material alpha.
Regime-specific LoRA adapters — DEFERRED
Section titled “Regime-specific LoRA adapters — DEFERRED”Train one LoRA per regime (trending / ranging / volatile / quiet). Meta-learner picks the adapter at inference from current regime classification. Complementary to Phase 6 (which does per-asset-class rolling retrain). Combining both = one adapter per (asset_class × regime) = 12 adapters. Probably overkill until event-conditioned baseline plateaus.
INT8 / FP8 quantization — REJECTED for v1
Section titled “INT8 / FP8 quantization — REJECTED for v1”Kronos-base inference is already 50 ms from cache. Quantization would shave 20-30 % but introduces quality risk. Not worth engineering time until inference becomes the bottleneck — e.g. if we serve live (non-cached) predictions or scale to 100+ instruments.
ONNX / TensorRT export — REJECTED
Section titled “ONNX / TensorRT export — REJECTED”Same rationale. Additional issue: Kronos uses dynamic token sampling which doesn’t export cleanly to ONNX graph. Would require model surgery that defeats the time saving.
Retrain Kronos from scratch — REJECTED (hard)
Section titled “Retrain Kronos from scratch — REJECTED (hard)”- Needs >10 M candles pre-training corpus
- 3+ months GPU time
- No evidence of upside vs LoRA fine-tuning on 80 K candles
- Zero forks with event conditioning exist — LoRA is the right path
Switch to Moirai-2 — REJECTED (licence)
Section titled “Switch to Moirai-2 — REJECTED (licence)”Best any-variate design, smallest fast model. But CC-BY-NC-4.0 means no commercial deployment.
Switch to TimesFM 2.5 — EVALUATED
Section titled “Switch to TimesFM 2.5 — EVALUATED”200 M params, Apache 2.0, 16 K context, XReg covariate support. Strong candidate for a second ensemble member if Chronos-2 underperforms. Keep in back pocket.
Instrumentation / ops wish-list
Section titled “Instrumentation / ops wish-list”- Confidence-weighted scoring — signal quality should be a function of envelope width, not just p50 direction. Partially exists; needs exposure on the prediction site.
- Prediction explain-ability — SHAP-style contributions of each event channel to the p50 shift. Would turn the model into a narrative tool, not just a number.
- Automatic drift alerts — when Chronos-2 and Kronos disagree >N standard deviations for K consecutive runs, alert and trigger Phase 6 retrain.
- A/B serving — route a fraction of prediction-site traffic to the ensemble model vs baseline, log conversion/engagement, validate real-user impact not just offline metrics.
What we won’t do
Section titled “What we won’t do”- Train a private financial LLM — orders of magnitude more data + compute; BloombergGPT / FinGPT territory.
- Real-time (sub-second) predictions — requires a completely different architecture; not the use case.
- Individual retail trading signals — this is a research / documentation site, not a signal provider. Prediction outputs are educational.