QuantLens Get Started
QuantLens

Clinical Trials & Patent Data for Quant Research — 3.35B Tokens, SHA-256 Verified

FDA catalyst prediction, biotech event calendars, federal contract signals, and patent intelligence — production-ready Parquet datasets with documented schemas and QA logs your team can deploy on day one.

Six flagship dataset families with documented QA and storage-root verification
Storage-root aware delivery with SHA-256 manifests and clear provenance
 Public-domain sources with transparent schemas and published QA notes
 Zero personal data: no PII or sensitive customer records just public datasets.
SHA-256
Cryptographic manifests for every file
3.35B+
Tokens across 6 dataset families
1999–2025
Historical coverage with schemas
$499–$11,999
Per-pack pricing, free teasers

Flagship dataset families

We group related datasets into families: BioTrials (clinical trials), ConstructAlpha (construction cycle), FedEventBench (federal events), FedFlow Intelligence Deck (federal contracts mapped to equities), NeuralFlow (intraday ML signals), and Patent/Oncology bundles (patent intelligence). Each family ships one or more datasets that share schemas, QA posture, and documentation.

Real Estate & Construction

ConstructAlpha Intelligence Deck

A 20-year, ML-ready benchmark of the $1.8T US construction cycle – engineered for quants, funds, and infra analysts.

2005–2024 macro coverage · ~60 region-month features · ~30 award features · event-study ready

Permits & Supply

Permit units, YOY growth, Z-scores, density metrics, 3-month EWMAs

Construction Spend

Res/nonres/infra spend, CAGRs, per-worker metrics, rolling averages

Labor & Wages

Employment growth, wage metrics, tightness scores, productivity proxies

Safety Signals

OSHA inspections, violations, fatality rates, safety heat scores

Federal Awards

Infra & military awards, award intensity, agency concentration

Financial AI

NeuralFlow AlphaDeck

152 pre-engineered ML features at 1-minute resolution. Download once, query offline forever.

66M rows (Enterprise) · Crypto + Equity + Forex + Indices · Sharpe 1.4-1.5 documented strategies

Price Action

OHLCV, VWAP, TWAP at minute resolution with point-in-time correctness

Technical Indicators

RSI, MACD, Bollinger Bands, ATR across multiple lookback windows

Neural Signals

Proprietary embeddings capturing market microstructure anomalies

Integration Kit

Jupyter notebooks, Python loaders, pricefeed_adapter.py for live trading

Documented Alpha

+0.95% 4h returns on whale signals, 61% win rates, full case studies

Government & Defense

FedFlow Intelligence Deck

$9.17T+ in federal contract awards with red-flag signals, ticker mapping, and price reactions.

6.5M+ rows · 2005–2025 · 1k+ tickers linked · $778B obligated dollars mapped

Contract Spine

Full FPDS feed partitioned by fiscal year/month with vendor details

Ticker Mapping

~8.5% of obligated dollars linked to 1k+ public company tickers

Red-Flag Signals

Rush jobs, no-competition flags, cost-plus vs fixed-price indicators

Price Reactions

ret_5d, ret_20d returns with price adapter for your own feeds

Defense Primes

LMT, RTX, NOC, GD, BA, HII, LHX, TDG coverage with event labels

Government & Defense

FedEventBench

19.4M labeled contract events with forward-looking signals for margins, hiring, and volatility.

2005–2025 · War escalation links · Margin jumps · Hire waves · Volatility spikes

War Escalation

175k events flagged within ±60 days of Iraq/Ukraine/Gaza conflict triggers

Margin Jump

Awards where gross margin +3pts in 12 months post-award vs pre

Hire Wave

Headcount growth ≥10% in 12 months after award vs before

Volatility Spike

Realized vol 1.3x+ in 6 months post-award vs prior period

Procurement Flags

no_competition_flag, is_cost_plus, is_fixed_price for risk scoring

Healthcare & Life Sciences

BioTrials Clinical Catalyst

ClinicalTrials.gov spine with sponsor normalization, ticker joins, and FDA catalyst labels.

560k trials · 1999–2025 · 1.5k tickers · Failed/halted trials filtered

Trial Spine

NCT IDs, phases, sponsors, conditions, interventions with status tracking

Sponsor Normalization

Cleaned sponsor names mapped to parent companies and tickers

Catalyst Labels

Primary completion, FDA review, PDUFA dates as tradeable events

FDA Linkage

fda_linked flag connecting trials to approval/rejection outcomes

Oncology Flags

is_oncology, is_cardiology, is_infectious therapeutic area tags

Patents & Innovation

PatentPulse Archive

6.75M+ USPTO grants from 2002–2025 with narratives, assignees, and CPC classifications.

99.97% extraction success · Zero duplicates · 95%+ CPC coverage (2013+)

Grant Spine

patent_id, application_id, kind_code with grant/filing dates

Assignee Data

assignee_name, assignee_country, inventor_names arrays

Narrative Text

abstract_text, claims_text, description_text for NLP/embeddings

CPC Classifications

Primary and secondary CPC codes for technology taxonomy

Join Hub

Designed as central entity table for citations, AI labels, litigation

Healthcare & Biotech

Oncology IP Pulse

269k cancer-flagged patents plus 1.12M citation edges linking NIH/FDA and Patent Master.

1911–2016 grant dates · NIH-linked: 7,289 · FDA-linked: 1,090 · Citation graph included

Cancer Universe

269k patents tagged via ST.32 entity extraction + oncology keyword rules

Citation Graph

1.12M directed edges showing who cites whom in cancer research

NIH Linkage

NIH_Federal_Grant_Number connecting patents to public funding

FDA Linkage

FDA_Application_Number connecting patents to drug approvals

Theme Flags

Drugs/Chemistry, Diagnostic Devices, Biomarkers, Algorithms

100% Success Rate

Every dataset goes through schema, coverage, and integrity checks before release. Post-publication validation covers every published file, and a measurable share of ingested files is rejected before it ever reaches the catalog.

SHA-256 manifests • catalog-level audit JSON Latest audit: Nov 4, 2025

Audit artifacts: catalog.json, public.catalog.json, storage_root map, and validator scripts available for download.

Sample Records

Every family exposes ready-to-query Parquet layouts plus marketing teasers. DuckDB + Snowflake snippets ship in the docs folder.

BioTrials Clinical Intelligence

{
  "nct_id": "NCT06088721",
  "sponsor_slug": "moderna-tx",
  "ticker": "MRNA",
  "phase": "Phase 3",
  "event_date": "2025-10-11",
  "catalyst_type": "primary_completion",
  "ret_5d": 0.041,
  "fda_linked": true,
  "sha256_manifest": "biotrials_pro_manifest_2025-11-27.json"
}

FedFlow Intelligence Deck

{
  "award_id": "W9128F25C0003",
  "ticker": "LMT",
  "award_date": "2025-07-02",
  "red_flag_score": 0.87,
  "rush_job": true,
  "jump_5d": 0.023,
  "jump_5d_premium": 0.031,
  "agency": "USACE",
  "focus_slice": "defense",
  "manifest_ref": "fedflow_signals_pro_manifest_2025-11-27.json"
}

Patent Master Pro / Oncology IP Pulse

{
  "patent_id": "US-11876543-B2",
  "grant_year": 2025,
  "is_cancer_patent": true,
  "nih_linked": true,
  "cpc_section": "A61K",
  "citation_out_degree": 14,
  "is_master_subset": true,
  "manifest_ref": "oncology_patent_manifest_2025-11-27.json"
}

Commercial Workflow

1. Explore JSON catalog

catalog.json + public.catalog.json describe every family, slug, storage_root, and CTA. Use them to power marketplaces or internal browsers.

Download catalog.json

2. Request sample or teaser

Each family ships a teaser Parquet, docs, and validator notes so buyers can check coverage before purchase.

See samples

3. License + deliver

Purchases unlock presigned URLs (5-minute TTL) plus Slack/Email onboarding, notebooks, and rebuild scripts.

Email sales

Bring Public Data into Production

Storage-root aware families and documented QA workflows mean you can trust every file before it lands in your lakehouse.

QuantLens Assistant

Hello! I'm the QuantLens Expert Assistant. I can help you find datasets, explain schemas, compare products, or troubleshoot issues. What are you looking for?