For Quant Researchers

Polymarket data for quantitative research

Resolved Markets is built for the kind of research that breaks when a vendor deletes data after 31 days. We keep every orderbook snapshot ever captured in ClickHouse, stamp each one with monotonic sequence numbers and millisecond-precision event and capture timestamps, and pair every crypto snapshot with the Binance spot price at capture time. The fields you need for microstructure work are first-class.

Last updated:

  • Full archiveHistorical retention
  • ~20 HzCrypto capture rate
  • ClickHouseBackend
  • Per-token sequence numbersGap detection

What makes the dataset useful for research

  • Full archive, no retention cliff. Multi-month strategy validation, longitudinal studies, and event studies on dated events (elections, FOMC, sports playoffs, hurricanes) all need data older than 31 days. Resolved Markets keeps it.
  • Microstructure-grade fields. Each snapshot has full bids and asks arrays as Array(Tuple(price, size)), plus best bid/ask, mid, spread, top-5 cumulative depth, sequence number, event timestamp, capture timestamp, and the paired crypto spot price with its staleness in milliseconds.
  • Gap detection. The monotonic sequence_number on every snapshot lets you detect dropped events from the upstream Polymarket WebSocket — important when you are reconstructing a true book and want to know where you can trust the depth.
  • Cross-category coverage. Crypto, sports, economics, weather, and Hyperliquid perpetual futures through one key. Useful for cross-asset and cross-event studies that competitor APIs can't answer.
  • ClickHouse backend. Aggregations across billions of rows return in seconds. Enterprise customers get direct query access; everyone else gets the same shape via REST + the rm-api download CLI which writes parquet locally.

Snapshot schema (what every record contains)

{
  "condition_id":     "0x...",            // Polymarket market id
  "token_id":         "1234...",          // UP or DOWN token
  "side":             "UP",               // UP | DOWN
  "event_timestamp":  "2026-05-04T14:23:00.123Z",   // Polymarket emitted
  "capture_timestamp":"2026-05-04T14:23:00.131Z",   // we processed
  "sequence_number":  82731,
  "best_bid":         0.6231,
  "best_ask":         0.6244,
  "mid":              0.62375,
  "spread":           0.0013,
  "bids":             [[0.6231, 412.5], [0.6225, 800.0], ...],
  "asks":             [[0.6244, 350.0], [0.6250, 612.0], ...],
  "depth5_bid":       4123.7,             // cum size, top 5 levels
  "depth5_ask":       3998.4,
  "spot_crypto_usd":  62418.50,           // Binance ref at capture
  "spot_crypto_age_ms": 84
}

Patterns for common quant tasks

Backfill a strategy in vectorbt or Backtrader

from rm_api import Client
import pandas as pd

c = Client(api_key=os.environ["RM_API_KEY"])
snaps = c.snapshots(condition_id, frm="2026-01-01", to="2026-04-01", limit=500)
df = pd.DataFrame(snaps).set_index("capture_timestamp")
# Now feed df["mid"] into vectorbt / Backtrader as a price series.

Bulk parquet export via CLI

rm-api download \
  --category crypto --subcategory BTC \
  --from 2026-01-01 --to 2026-05-01 \
  --format parquet --out ./btc_snapshots/

Detect gaps before computing realized spread

SELECT condition_id, token_id,
       max(sequence_number) - min(sequence_number) AS span,
       count() AS rows,
       (max(sequence_number) - min(sequence_number) + 1) - count() AS missing
FROM polymarket.snapshots_hf
WHERE capture_timestamp >= now() - INTERVAL 1 DAY
GROUP BY condition_id, token_id
HAVING missing > 0
ORDER BY missing DESC;

Academic and research access

University researchers with a .edu email can request extended Enterprise access in exchange for a citation in published work. Email [email protected] with a short description of the project. The dataset has already been used for studies on prediction-market liquidity, microstructure of binary outcomes, and cross-asset event studies.

Frequently asked questions