← Projects v2 paper trading

AlphaCloud

Weather derivatives trading on Kalshi

What it is

AlphaCloud is a trading bot for Kalshi's temperature markets. It trades contracts like "Will the high in NYC tomorrow be between 45 and 46°F?" (binary bets priced from 1¢ to 99¢ that pay out $1.00 if they're right).

19 cities. 27 market series. 300+ markets per scan. 9 pluggable trading signals.

The v2 paper trader runs alongside the original v4/v7 panel, recording ensemble snapshots and orderbook data on every scan.

Why this exists

This is the data scientist side of me wanting a playground. Weather prediction markets are a fascinating niche. The underlying data is completely public (weather forecasts), the markets are liquid enough to trade, and there's real edge available if you can blend models better than the average participant.

It's also one of the best systems engineering problems I've worked on. Real-time data ingestion, ensemble model blending, probabilistic edge detection, execution algorithms, risk management, self-improving ML. Every piece has to work together or the whole thing falls apart.

The forecast pipeline

The original v7 system pulled ensemble forecasts from four weather model sources, 123 members total:

  • GEFS: 31 ensemble members from NOAA (US)
  • ECMWF: 51 ensemble members from the European Centre (generally considered the best in the world)
  • ICON: 40 ensemble members from Germany's DWD
  • NBM: 1 deterministic member from the NWS National Blend of Models

All 123 members were pooled into a Gaussian distribution per city per day, then calibrated with EMOS (per-city learned coefficients that correct for systematic bias). The calibration delivered ~49% improvement in forecast accuracy across 15 cities. The v2 system still uses Open-Meteo as its data source, but the signal config was reshaped by empirical backtest findings (more on that below).

Edge detection and execution

The edge detector converts ensemble probabilities into bracket prices using Gaussian CDF with half-integer settlement boundaries, then compares them against what Kalshi's market is actually pricing.

If my models say 40% and the market says 25¢, that's a potential edge.

Position sizing uses 3-level adaptive Kelly criterion: per-bet uncertainty adjustment, within-event multi-outcome optimization, and cross-event portfolio caps. Execution runs an IOC/GTC hybrid: high-confidence trades execute immediately, lower-confidence trades sit as limit orders.

The exit system watches for forecast updates and runs a 4-tier response: EMERGENCY (the model completely flipped), HARD (significant shift), SOFT (minor drift, needs 2 consecutive scans), HOLD (no change). Combined with daily loss limits and position caps, it keeps things from going sideways.

Self-improving ML

This is the part I'm most excited about. The system doesn't just trade. It learns from every trade and automatically gets better over time.

  • Data collection: every scan stores raw ensemble members, orderbook snapshots, edge evaluations, and forecasts (~1MB/day)
  • Drift detection: rolling Brier scores with CUSUM test detect when model calibration degrades
  • Auto-retrain: when drift is detected, EMOS calibration, edge classifier, and probability calibrator retrain automatically from accumulated data

No manual intervention needed. The more it trades, the smarter it gets.

Where it's at

In April 2026, I ran an edge-reality audit on the v7 system and found something uncomfortable: the model's 70-89% probability bucket had 0 wins out of 969 trades. P(0 | p=0.50) = 2.58e-26. The +$330 paper P&L I had been watching was five lucky 1-3 cent contracts plus simulation bugs. The v7 system got paused, and live deployment was called off.

What came out of the audit was v2: a ground-up rebuild with 9 pluggable trading signals, hard module isolation, Brier scoring as the primary quality metric, and no assumptions about EMOS lifting the edge. The empirical backtest findings also reshaped the signal config: the YES-ask 1-5c bucket (the most common trade type) had a -63% EV/$ for buy-YES. The v8_long_tail_maker signal got demoted. Meanwhile, the YES-ask 71-95c bucket showed a +7.6% EV/$ for buy-NO, which is now a candidate for a dedicated signal.

v2 Phase 0 is deployed and running on Railway. The alphacloud-v2-paper service records both ensemble snapshots and intraday orderbook data on every scan (the data that makes backtesting possible). Next milestone is accumulating 3-6 months of forward-recorded data to backtest the 6 signals that couldn't be validated otherwise, then deciding whether any config graduates to live trading.