Skip to content

Phase 3 · LoRA Training Pipeline

Priority: High Status: Pending Depends on: Phase 2 (modified model)

  • Base Kronos model: 102.3M params, trained on OHLCV sequences
  • LoRA fine-tuning: only train ~0.5% of params (event embedding + LoRA adapters)
  • Hardware: RTX 4060 (8GB VRAM), ~121MB model footprint

Create a training script that fine-tunes Kronos with LoRA on event-labeled OHLCV data. Freeze BSQ tokenizer, HierarchicalEmbedding, TemporalEmbedding, DualHead. Train only event embedding + LoRA adapters on q_proj/v_proj.

  • Load frozen Kronos-base from HuggingFace
  • Add LoRA adapters to transformer blocks (target: q_proj, v_proj)
  • Train EventEmbedding from scratch (Xavier init)
  • Build event-labeled dataset from Supabase (OHLCV + economic_calendar)
  • Support event oversampling (3-5x weight for event days)
  • Save trained LoRA weights separately from base model
  • Training on RTX 4060 with batch_size=32
  • Complete in <4 hours
  • Memory usage <6GB VRAM (leave headroom)
Target modules: q_proj, v_proj in each TransformerBlock
LoRA rank: 8 (bumped from 4 — 20-channel conditioning needs richer adapter)
LoRA alpha: 16
Dropout: 0.05
Trainable params: ~1M (1% of 102.3M) — LoRA + EventEmbedding(20→256)
class EventConditionedDataset:
"""
Each sample:
- OHLCV features: (seq_len, 6) — z-score normalized
- Timestamps: (seq_len, 5) — minute, hour, weekday, day, month
- Events: (seq_len, 20) — event + surprise_z + days_until + cross-asset leaders
- s1_targets: (seq_len,) — teacher forcing targets
- s2_targets: (seq_len,)
"""

Training data builder MUST pre-compute leader returns alongside OHLCV:

# For each training sample window [t_start, t_end]:
leader_data = {}
for leader in ["BTCUSDT", "SPY", "DXY", "VIX"]:
# Fetch bars STRICTLY before each target bar — no lookahead
bars = fetch_ohlcv(leader, end=t_end, shift_right=1)
leader_data[leader] = bars
# Feed to EventEncoder.encode(..., leader_data=leader_data)
1. Load frozen Kronos tokenizer + predictor
2. Add LoRA adapters (peft library)
3. Initialize EventEmbedding (random)
4. For each epoch:
a. Sample batch (oversample event days 3x)
b. Encode OHLCV → (s1_ids, s2_ids) with frozen tokenizer
c. Build event tensor with EventEncoder
d. Forward pass: model(s1, s2, stamp, events=events)
e. Loss = CE(s1_logits, s1_targets) + CE(s2_logits, s2_targets)
f. Backward (only LoRA + event_emb params get gradients)
g. Step optimizer
5. Save LoRA weights + event_emb state_dict
ParamValueRationale
Learning rate1e-4LoRA standard (higher than pre-training)
SchedulerCosine annealing + 100 step warmupStandard for fine-tuning
Batch size32RTX 4060 limit
Epochs15LoRA converges fast
LoRA rank820-channel conditioning justifies richer adapter
LoRA alpha16Standard
OptimizerAdamW (weight_decay=0.01)Standard
Event oversample3xEnsure model sees enough event transitions
Gradient clip1.0Prevent instability
Source data:
- ohlcv_1d: 81 symbols × ~296 days/yr × 5 years ≈ 120K samples
- economic_calendar: ~18 event types, ~200 high-impact events/year
Train/val/test split:
- Train: 2021-2024 (80%)
- Val: 2025 Q1-Q2 (10%)
- Test: 2025 Q3-Q4 + 2026 Q1 (10%)
- Stratify by event presence in all splits
Context window: 512 candles (matches existing Kronos config)
Prediction: next-token (teacher forcing during training)
  1. Create kronos-service/finetune/train_event_conditioned.py
  2. Implement EventConditionedDataset — load OHLCV + calendar + leader OHLCV from Supabase
  3. Implement LoRA wrapping with peft library
  4. Implement training loop with event oversampling
  5. Implement validation loop (track loss on event vs non-event samples)
  6. Implement checkpoint saving (LoRA weights + event_emb only)
  7. Add training metrics logging (TensorBoard or JSON)
  8. Test: verify only LoRA + event_emb params have gradients
  9. Test: verify frozen params don’t change after one training step
  • Create: kronos-service/finetune/train_event_conditioned.py
  • Read: kronos-service/finetune/shared.py (existing feature normalization)
  • Read: kronos-service/finetune/export_training_data.py (existing data export pattern)
  • Dependencies: peft library (pip install peft)
  • Training loss decreases monotonically over 15 epochs
  • Val loss on event days decreases (model learns event patterns)
  • Val loss on non-event days stays within 5% of base model (no regression)
  • ~1M params have gradients (LoRA rank 8 + EventEmbedding(20))
  • Ablation: rank 8 beats rank 4 val loss on event-day samples ≥ 1%
  • No train/test leakage — leader bar timestamps strictly precede target
  • Training completes in <4 hours on RTX 4060
  • Saved checkpoint <50MB (LoRA weights only, not full model)