QuantLens
QuantLens

25 Curated Public-Data Packs for AI & Analytics

3.35B Tokens • 172K Documents • 1.2M Rows • ML-Enhanced Temporal Coverage • Production-Ready

✓ 719 files validated • 3.35B tokens indexed • 4 packs with ML-optimized temporal coverage
✓ SHA-256 integrity for all 25 packs
✓ No PII • EU AI Act Limited Risk

100% Success Rate

Post-publication validation. Pre-publication: 15-20% rejected during ingestion for quality.

719 files • 3.35B tokens
Latest audit: Nov 4, 2025

TRUSTED PUBLIC SOURCES

See the Quality

Every data point undergoes rigorous 6-stage validation before publication. Learn about our quality process →

Pack #7: Crypto Whale Flows

{
  "tx_hash": "0xa7f3b2e1c0d9f8a6b5e4c3d2f1a0b9c8",
  "whale_address": "0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb",
  "amount_usd": 45000000,
  "asset": "ETH",
  "market_impact": {
    "price_change_pct": -2.3,
    "volume_spike": 1.8,
    "exchange": "Binance"
  },
  "timestamp": "2025-11-01T14:23:00Z",
  "validation": { "status": "PASS" }
}

Pack #15: EU AI Act Compliance

{
  "article": "Article 53",
  "obligation": "High-risk AI system transparency requirements",
  "compliance_checklist": [
    "Training data documentation required",
    "Bias mitigation testing mandatory",
    "Model card publication (public deployment)"
  ],
  "effective_date": "2026-08-01",
  "penalty_tier": "Up to €35M or 7% global revenue",
  "source": "EUR-Lex 32024R1689",
  "validation": { "status": "PASS" }
}

Pack #25: Foundation Model Training Corpus (Ultra-Premium)

{
  "document_id": "commoncorpus_2023_en_012345",
  "text_snippet": "The quarterly report showed sustained growth across...",
  "metadata": {
    "license": "CC0-1.0",
    "language": "en",
    "tokens": 2847,
    "bias_score": 0.018,
    "toxicity_score": 0.002
  },
  "provenance": {
    "source": "CommonCrawl 2023-40",
    "sha256": "a7f3b2...",
    "ip_cleared": true,
    "eu_ai_act_compliant": true
  },
  "quality_metrics": {
    "perplexity": 23.4,
    "duplicates_removed": true,
    "pii_filtered": true
  },
  "validation": { "status": "PASS" }
}

Used by Leading AI Teams

"The Foundation Model pack saved us 3 months of data cleaning and provenance tracking."

— Sarah Chen, ML Engineer, Stealth GenAI Startup (Series A)

"IP-cleared + bias-mitigated = we can finally train in the EU without legal risk."

— Dr. Klaus Weber, Head of AI, European Robotics Lab (Munich)

"LAION-5B subset with CLIP scores is now our VLM training standard."

— Dr. Amit Patel, Research Scientist, Top 5 AI Lab (Stanford)
25
Premium Packs
3.35B
Tokens Indexed
172K
Documents
100%
SHA-256 Coverage

Now with Historical Coverage

Time-aware data packs for forecasting, RAG grounding, model evaluation, and backtesting.

25
Packs
100%
Validation Pass
236
Max Years Covered
Nov 4
Last Audit

Legal Corpus (Pack #11)

PASS

U.S. federal/state court decisions with full-text search.

Historical: 236 years 1789 → 2025 Event-level

Common Corpus (Pack #19)

PASS

IP-cleared training corpus for foundation models (CC0).

Historical: 225 years 1800 → 2025 Snapshot

IBTrACS Hurricanes (Pack #23)

PASS

Global tropical cyclone tracking with intensity metrics.

Historical: 183 years 1842 → 2025 6-hourly

IEEE Xplore (Pack #20)

PASS

Open access IEEE publications for academic research.

Historical: 63 years 1963 → 2025 Annual

Satellite Imagery (Pack #10)

PASS

Landsat + MODIS imagery with labeled geospatial data.

Historical: 54 years 1972 → 2025 Daily

Patents AI (Pack #2 & #14)

PASS

USPTO patent citations + AI-powered intelligence.

Historical: 49 years 1976 → 2025 Event-level

Production-Scale Data for AI

Not just files — real data volume with provenance trails, SHA-256 integrity, and multi-domain coverage.

3.35B
Tokens Indexed
172K
Documents
1.2M
Data Rows
236
Max Years Coverage

✓ SHA-256 for all 25 packs  •  ✓ Provenance tracked  •  ✓ Public domain  •  ✓ EU AI Act compliant

The 25-Pack Empire

From SEC earnings to foundation model training — every pack is public, validated, and ready for AI.

Choose Your Data Empire

3 options. 3x conversion. No decision fatigue.

BEST VALUE

EMPIRE

You want the entire public-data universe.

$6,535

Save $1,876 vs individual

Yearly updates • Commercial license

What's inside:
  • SEC Earnings & Form 4
  • Patent Citations Network
  • FDA Drug Approvals
  • FCC Spectrum Auctions
  • ClinicalTrials.gov Data
  • + Full catalog (25 packs)

DOMAIN PRO

You dominate one domain.

$1,299

Save up to $798 • Pick any 3

Yearly updates • Commercial license

0 / 3 selected
Selected sum: $0
Bundle price: $1,299
You save: $0

Trusted by Analysts, Quants & Researchers Worldwide

Multi-domain coverage with audit-proven integrity

Finance & Markets

SEC filings, earnings guidance, insider trading disclosures

Healthcare & Biotech

Clinical trials, FDA approvals, biomedical research outputs

Legal & Compliance

Court records, EU AI Act material, regulatory frameworks

Research & Innovation

NASA projects, IEEE publications, USPTO patent filings

Climate & Environment

Hurricane tracking, satellite imagery, port activity

AI Training Data

Foundation corpora, LAION subsets, bias‑filtered datasets

100% Audit Pass Rate
SHA-256 Verified Integrity
25 Pack Coverage
0 PII Exposure