Jmail Data API

Pre-computed datasets from the Jeffrey Epstein email archive — House Oversight Committee, Department of Justice, and Yahoo account releases.

Text extraction powered by Reducto. Video understanding by Kino AI. Built by the Jmail team.

v1 — Released February 25, 2026. View changelog

Available Datasets

Emails 1.78M records

emails.parquet 334 MB — All emails with full body text
emails-slim.parquet 41 MB — Emails without body text, great for network analysis

Documents 1.41M records

documents.parquet 25 MB — Document metadata only, no extracted text
documents-full/VOL00009.parquet 531K docs — Full extracted text
documents-full/VOL00010.parquet 503K docs — Full extracted text
documents-full/DataSet11.parquet 332K docs — Full extracted text
documents-full/VOL00008.parquet 32K docs — Full extracted text
documents-full/other.parquet 15K docs — Remaining documents with extracted text

Photos & People 18K photos

photos.parquet ~1 MB — Photo metadata and AI descriptions
people.parquet <100 KB — 473 identified people from facial recognition
photo_faces.parquet <100 KB — 975 face bounding boxes

iMessages 4.5K messages

imessage_conversations.parquet <10 KB — 15 conversation threads with contacts and message counts
imessage_messages.parquet ~100 KB — Text messages with timestamps and sender info

Community & Metadata 414K stars

star_counts.parquet ~2 MB — Crowd-sourced importance signals
release_batches.parquet <10 KB — Release batch metadata
manifest.json <10 KB — Dataset discovery (checksums, sizes, record counts)

All datasets are also available in NDJSON format: replace .parquet with .ndjson.gz.

Content Negotiation

Extensionless paths like /v1/emails redirect to .parquet by default. Send Accept: application/x-ndjson to get the NDJSON variant instead.

Quick Start with DuckDB

SELECT sender, COUNT(*) as n
FROM read_parquet('https://data.jmail.world/v1/emails.parquet')
GROUP BY sender
ORDER BY n DESC
LIMIT 20;

Python

import duckdb

conn = duckdb.connect()
df = conn.sql("""
  SELECT * FROM read_parquet('https://data.jmail.world/v1/emails.parquet')
  LIMIT 100
""").df()
print(df)

Version Aliases

/latest/* redirects to /v1/*. When new schema versions are published, /latest will point to the newest version.

Export Status

View the live export status page to monitor data pipeline progress.