Legal Corpus (Pack #11)
PASSU.S. federal/state court decisions with full-text search.
3.35B Tokens • 172K Documents • 1.2M Rows • ML-Enhanced Temporal Coverage • Production-Ready
Get sample data from Pack #1 (SEC Earnings) — 50 transcripts, 2,500 AI-labeled rows
Post-publication validation. Pre-publication: 15-20% rejected during ingestion for quality.
Every data point undergoes rigorous 6-stage validation before publication. Learn about our quality process →
{
"tx_hash": "0xa7f3b2e1c0d9f8a6b5e4c3d2f1a0b9c8",
"whale_address": "0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb",
"amount_usd": 45000000,
"asset": "ETH",
"market_impact": {
"price_change_pct": -2.3,
"volume_spike": 1.8,
"exchange": "Binance"
},
"timestamp": "2025-11-01T14:23:00Z",
"validation": { "status": "PASS" }
}
{
"article": "Article 53",
"obligation": "High-risk AI system transparency requirements",
"compliance_checklist": [
"Training data documentation required",
"Bias mitigation testing mandatory",
"Model card publication (public deployment)"
],
"effective_date": "2026-08-01",
"penalty_tier": "Up to €35M or 7% global revenue",
"source": "EUR-Lex 32024R1689",
"validation": { "status": "PASS" }
}
{
"document_id": "commoncorpus_2023_en_012345",
"text_snippet": "The quarterly report showed sustained growth across...",
"metadata": {
"license": "CC0-1.0",
"language": "en",
"tokens": 2847,
"bias_score": 0.018,
"toxicity_score": 0.002
},
"provenance": {
"source": "CommonCrawl 2023-40",
"sha256": "a7f3b2...",
"ip_cleared": true,
"eu_ai_act_compliant": true
},
"quality_metrics": {
"perplexity": 23.4,
"duplicates_removed": true,
"pii_filtered": true
},
"validation": { "status": "PASS" }
}
"The Foundation Model pack saved us 3 months of data cleaning and provenance tracking."
— Sarah Chen, ML Engineer, Stealth GenAI Startup (Series A)"IP-cleared + bias-mitigated = we can finally train in the EU without legal risk."
— Dr. Klaus Weber, Head of AI, European Robotics Lab (Munich)"LAION-5B subset with CLIP scores is now our VLM training standard."
— Dr. Amit Patel, Research Scientist, Top 5 AI Lab (Stanford)Time-aware data packs for forecasting, RAG grounding, model evaluation, and backtesting.
U.S. federal/state court decisions with full-text search.
IP-cleared training corpus for foundation models (CC0).
Global tropical cyclone tracking with intensity metrics.
Open access IEEE publications for academic research.
Landsat + MODIS imagery with labeled geospatial data.
USPTO patent citations + AI-powered intelligence.
Not just files — real data volume with provenance trails, SHA-256 integrity, and multi-domain coverage.
✓ SHA-256 for all 25 packs • ✓ Provenance tracked • ✓ Public domain • ✓ EU AI Act compliant
From SEC earnings to foundation model training — every pack is public, validated, and ready for AI.
3 options. 3x conversion. No decision fatigue.
You want the entire public-data universe.
—
Save $1,876 vs individual
You train foundation models.
Save $200 • Top 5 AI packs
You dominate one domain.
Save up to $798 • Pick any 3
Multi-domain coverage with audit-proven integrity
SEC filings, earnings guidance, insider trading disclosures
Clinical trials, FDA approvals, biomedical research outputs
Court records, EU AI Act material, regulatory frameworks
NASA projects, IEEE publications, USPTO patent filings
Hurricane tracking, satellite imagery, port activity
Foundation corpora, LAION subsets, bias‑filtered datasets