Skip to content

Future Improvements

Ideas evaluated during brainstorming but deliberately not in the current 5-week plan. Logged here so they don’t get lost.

timeline
title Kronos roadmap · beyond 2026-04
Phase 1 (current) : Event-conditioned Kronos
: Chronos-2 ensemble
: Rolling fine-tune
Phase 2 (Q3 2026) : Cross-asset token model
: Shared BSQ + group attention
: iTransformer-style variate attention
Phase 3 (Q4 2026) : Multi-modal fusion
: Price + macro + news tokens
: Two-tower + cross-attention
Phase 4 (2027) : Regime-specific adapters
: Per-regime LoRA swap at inference
: Meta-learner picks adapter

Cross-asset token model — DEFERRED to Phase 2

Section titled “Cross-asset token model — DEFERRED to Phase 2”

Pass multiple assets into a shared tokenizer simultaneously. Each asset tokenized independently, then cross-attention across assets. Moirai-2 does this natively but its CC-BY-NC licence blocks commercial use. Chronos-2’s group attention is the pragmatic substitute.

For now, Phase 1 approximates this with 4 cross-asset leader channels. True cross-sectional attention (all 23 instruments attending to each other) is a bigger architecture lift.

Multi-modal fusion — DEFERRED to Phase 3

Section titled “Multi-modal fusion — DEFERRED to Phase 3”

Combine:

  • Price token stream (Kronos / Chronos-2)
  • Macro indicator stream (FRED series embedded)
  • News text stream (FinBERT embeddings)

Research gap: no production model fuses all three natively. State-of-art is two-tower + fusion (e.g. FinMem, FinTral). Engineering investment: 6–12 months. Revisit when Phase 1 baseline is solid and we have a reason to believe each modality adds material alpha.

Regime-specific LoRA adapters — DEFERRED

Section titled “Regime-specific LoRA adapters — DEFERRED”

Train one LoRA per regime (trending / ranging / volatile / quiet). Meta-learner picks the adapter at inference from current regime classification. Complementary to Phase 6 (which does per-asset-class rolling retrain). Combining both = one adapter per (asset_class × regime) = 12 adapters. Probably overkill until event-conditioned baseline plateaus.

INT8 / FP8 quantization — REJECTED for v1

Section titled “INT8 / FP8 quantization — REJECTED for v1”

Kronos-base inference is already 50 ms from cache. Quantization would shave 20-30 % but introduces quality risk. Not worth engineering time until inference becomes the bottleneck — e.g. if we serve live (non-cached) predictions or scale to 100+ instruments.

Same rationale. Additional issue: Kronos uses dynamic token sampling which doesn’t export cleanly to ONNX graph. Would require model surgery that defeats the time saving.

Retrain Kronos from scratch — REJECTED (hard)

Section titled “Retrain Kronos from scratch — REJECTED (hard)”
  • Needs >10 M candles pre-training corpus
  • 3+ months GPU time
  • No evidence of upside vs LoRA fine-tuning on 80 K candles
  • Zero forks with event conditioning exist — LoRA is the right path

Best any-variate design, smallest fast model. But CC-BY-NC-4.0 means no commercial deployment.

200 M params, Apache 2.0, 16 K context, XReg covariate support. Strong candidate for a second ensemble member if Chronos-2 underperforms. Keep in back pocket.

  • Confidence-weighted scoring — signal quality should be a function of envelope width, not just p50 direction. Partially exists; needs exposure on the prediction site.
  • Prediction explain-ability — SHAP-style contributions of each event channel to the p50 shift. Would turn the model into a narrative tool, not just a number.
  • Automatic drift alerts — when Chronos-2 and Kronos disagree >N standard deviations for K consecutive runs, alert and trigger Phase 6 retrain.
  • A/B serving — route a fraction of prediction-site traffic to the ensemble model vs baseline, log conversion/engagement, validate real-user impact not just offline metrics.
  • Train a private financial LLM — orders of magnitude more data + compute; BloombergGPT / FinGPT territory.
  • Real-time (sub-second) predictions — requires a completely different architecture; not the use case.
  • Individual retail trading signals — this is a research / documentation site, not a signal provider. Prediction outputs are educational.