Polymarket data for quantitative research
Resolved Markets is built for the kind of research that breaks when a vendor deletes data after 31 days. We keep every orderbook snapshot ever captured in ClickHouse, stamp each one with monotonic sequence numbers and millisecond-precision event and capture timestamps, and pair every crypto snapshot with the Binance spot price at capture time. The fields you need for microstructure work are first-class.
Last updated:
- Full archiveHistorical retention
- ~20 HzCrypto capture rate
- ClickHouseBackend
- Per-token sequence numbersGap detection
What makes the dataset useful for research
- Full archive, no retention cliff. Multi-month strategy validation, longitudinal studies, and event studies on dated events (elections, FOMC, sports playoffs, hurricanes) all need data older than 31 days. Resolved Markets keeps it.
- Microstructure-grade fields. Each snapshot has full
bidsandasksarrays asArray(Tuple(price, size)), plus best bid/ask, mid, spread, top-5 cumulative depth, sequence number, event timestamp, capture timestamp, and the paired crypto spot price with its staleness in milliseconds. - Gap detection. The monotonic
sequence_numberon every snapshot lets you detect dropped events from the upstream Polymarket WebSocket — important when you are reconstructing a true book and want to know where you can trust the depth. - Cross-category coverage. Crypto, sports, economics, weather, and Hyperliquid perpetual futures through one key. Useful for cross-asset and cross-event studies that competitor APIs can't answer.
- ClickHouse backend. Aggregations across billions of rows return in seconds. Enterprise customers get direct query access; everyone else gets the same shape via REST + the
rm-api downloadCLI which writes parquet locally.
Snapshot schema (what every record contains)
{
"condition_id": "0x...", // Polymarket market id
"token_id": "1234...", // UP or DOWN token
"side": "UP", // UP | DOWN
"event_timestamp": "2026-05-04T14:23:00.123Z", // Polymarket emitted
"capture_timestamp":"2026-05-04T14:23:00.131Z", // we processed
"sequence_number": 82731,
"best_bid": 0.6231,
"best_ask": 0.6244,
"mid": 0.62375,
"spread": 0.0013,
"bids": [[0.6231, 412.5], [0.6225, 800.0], ...],
"asks": [[0.6244, 350.0], [0.6250, 612.0], ...],
"depth5_bid": 4123.7, // cum size, top 5 levels
"depth5_ask": 3998.4,
"spot_crypto_usd": 62418.50, // Binance ref at capture
"spot_crypto_age_ms": 84
}Patterns for common quant tasks
Backfill a strategy in vectorbt or Backtrader
from rm_api import Client
import pandas as pd
c = Client(api_key=os.environ["RM_API_KEY"])
snaps = c.snapshots(condition_id, frm="2026-01-01", to="2026-04-01", limit=500)
df = pd.DataFrame(snaps).set_index("capture_timestamp")
# Now feed df["mid"] into vectorbt / Backtrader as a price series.Bulk parquet export via CLI
rm-api download \
--category crypto --subcategory BTC \
--from 2026-01-01 --to 2026-05-01 \
--format parquet --out ./btc_snapshots/Detect gaps before computing realized spread
SELECT condition_id, token_id,
max(sequence_number) - min(sequence_number) AS span,
count() AS rows,
(max(sequence_number) - min(sequence_number) + 1) - count() AS missing
FROM polymarket.snapshots_hf
WHERE capture_timestamp >= now() - INTERVAL 1 DAY
GROUP BY condition_id, token_id
HAVING missing > 0
ORDER BY missing DESC;Academic and research access
University researchers with a .edu email can request extended Enterprise access in exchange for a citation in published work. Email info@elcara.xyz with a short description of the project. The dataset has already been used for studies on prediction-market liquidity, microstructure of binary outcomes, and cross-asset event studies.
Frequently asked questions
Why is Resolved Markets useful for quantitative research?
Three reasons. (1) Full historical archive — no 31-day retention limit, so multi-month and multi-quarter studies are possible. (2) Microstructure-grade fields — every snapshot carries best bid/ask, full depth, mid, spread, sequence numbers, eventTimestamp, captureTimestamp, and a paired crypto spot price for cross-asset joins. (3) ClickHouse backend — analytical queries over billions of rows are seconds, not minutes.
How do I pull historical data efficiently?
Use /v1/markets/:id/snapshots with limit=500 and ISO from/to bounds, or paginate by sequence number. For bulk pulls across many markets, the CLI rm-api download writes parquet locally. Enterprise customers get direct ClickHouse query access.
Is there gap detection in the data stream?
Yes — every snapshot has a monotonic sequence number per token. Gaps in the sequence indicate dropped events from the upstream Polymarket WebSocket and are flagged in our /api/debug endpoint and the rm-api gaps command.
Can I cross-reference Polymarket prices with Binance spot?
Each snapshot includes the spot crypto price (from Binance) at capture time, with a price-staleness field measuring milliseconds since the last spot update. This lets you compute implied vs realized spreads or basis without joining external feeds.
Do you have an academic / research access program?
Yes — university researchers with .edu addresses can request extended Enterprise access in exchange for a citation in published work. Contact info@elcara.xyz with a brief project description.