Skip to content

Data Gaps

Source: researcher-260423-1502-kronos-data-gaps.md (full analysis).

ChChannelStatusSourceNotes
0is_fomc✅ HAVEeconomic_calendarBinary, daily
1is_cpi✅ HAVEeconomic_calendarMonthly
2is_nfp✅ HAVEeconomic_calendarMonthly
3is_gdp✅ HAVEeconomic_calendarQuarterly
4is_pce✅ HAVEeconomic_calendarMonthly
5cpi_surprise_z🟡 PARTIALeconomic_calendar.actualforecastNeed rolling-20 std cache
6fomc_hawkish_score❌ MISSINGNo Fed statement text stored
7nfp_surprise_z🟡 PARTIALSame as CPISame rolling-std work
8is_earnings✅ HAVEeconomic_calendar filteredtitle ILIKE '% Earnings'
9is_rate_decision✅ HAVEcentral_bank_ratesVia bank_code + change_date
10days_to_fomc_sin✅ HAVEDerivablePure compute from is_fomc
11days_to_fomc_cos✅ HAVEDerivable
12days_to_cpi_sin✅ HAVEDerivable
13days_to_cpi_cos✅ HAVEDerivable
14btc_log_return_1h✅ HAVEohlcv_1h, BTC-USD40 symbols tracked
15spy_log_return_1h✅ HAVEohlcv_1h, SPYNYSE hours — mask when closed
16dxy_log_return_1h🟡 PARTIALohlcv_1h, DX-Y.NYBOnly <35 d history — need ≥6 mo backfill
17vix_level_z🟡 PARTIALFRED VIXCLS dailyDaily only, no hourly
18–19reserved
StatusChannels%
✅ HAVE — ready now1470 %
🟡 PARTIAL — proxy available420 %
❌ MISSING — needs new source210 %
PathChannels usableExpected quality vs. oracle
P0 only (6 h work, proxies)18 / 20~75–80 %
P0 + P1 (backfill, ~16 h)19 / 20~85–90 %
All backfills (~32 h)20 / 20~95 %+

Recommendation: proceed to Phase 3 training with P0 proxies; upgrade Tier-1 sources after seeing training results. Details on the Backfill Plan.

economic_calendar.actual and .forecast exist. Missing piece is the rolling-20 standard deviation per event type. Materialise it as a view or cache table; ~4 h.

No FOMC statement text in the DB. Two paths:

  • Tier 1 (full, 16 h): Scrape Fed statements → PyPDF2 → FinBERT sentiment → fomc_statements table.
  • Tier 2 (proxy, 2 h): sign(rate_change) × |delta| / 0.25. Drops accuracy but unblocks Phase 3 in a morning.

Ship Tier 2 to start; upgrade to Tier 1 later.

DX-Y.NYB is in tickers and ohlcv_1h exists, but only ~35 days of hourly history. Backfill 2021→today from Yahoo DX=F or Alpha Vantage. ~8 h.

FRED VIXCLS provides daily VIX, fine for daily-horizon predictions. For hourly-horizon models, need intraday VIX from CBOE (subscription) or derive from SPY realized vol (~72 % correlation). P3 — not blocking v1.

economic_calendar has earnings entries for some tickers but coverage is uneven. Finnhub earnings API can fill gaps. ~2 h.

  1. Error tolerance for VIX Tier-2 proxy (SPY realised vol, ~72 % correlation)?
  2. Fed-website HTML structure stable enough for a long-lived scraper?
  3. DXY bar_time timezone in ohlcv_1h — aligned to UTC or ET?
  4. Compute earnings surprise z-score from earnings_estimates, or treat as binary-only?

These won’t block Phase 3 kickoff but will tighten the final model.