Datasets
Open reference datasets, published as immutable Parquet and queryable in your browser with DuckDB.
Meridian Datasets is a small, curated commons of open reference data — the slow-changing tables you reach for again and again when you're resolving companies, industries, or securities. Each dataset is published once as an immutable Parquet file on Cloudflare R2 and served from a stable URL at openlake.meridian.online. Nothing to download, nothing to sign up for.
Every dataset has a live explorer at /datasets. Open one and a DuckDB engine boots in your browser tab — no server touches your query. Sort, filter, facet, run SQL, profile the columns, and export the result to CSV, all against the real file over HTTP range reads. The same file is addressable from anywhere: one ATTACH exposes the whole catalog to DuckDB, Python, or R (see Use anywhere).
The point is not to be a data warehouse. It's to make a handful of authoritative reference tables instantly queryable, honestly sourced, and cheap to trust — so you can spend your time on the analysis, not on wrangling a download.
The catalog
Row counts as of 2026-07-04
Row counts below are a snapshot; the published manifest always carries the current release and its published timestamp. Datasets are refreshed periodically (manual today — see Data & provenance).
| Dataset | Rows | License | Source | Explore |
|---|---|---|---|---|
| GLEIF — Legal Entity Identifiers | 3,361,809 | CC0 | GLEIF golden copy | /datasets/gleif |
| SEC EDGAR — Company Tickers | 10,415 | Public domain | SEC EDGAR | /datasets/edgar |
| NAICS — Industry Classification | 2,125 | Public domain | U.S. Census Bureau | /datasets/naics |
- GLEIF — the Global Legal Entity Identifier Foundation's public register: every legal entity with its 20-character LEI, legal name, country, jurisdiction, legal form, and registration status.
- SEC EDGAR — every company with securities registered with the U.S. Securities and Exchange Commission, keyed by Central Index Key (CIK), with ticker and exchange.
- NAICS — the 2022 North American Industry Classification System, from 20 broad sectors down to 1,012 national industries, each with its official title and description.
Open licenses only
Everything here is open at the source. Today that means CC0 (GLEIF) and U.S. public domain (SEC EDGAR, NAICS Census data) — free to use, including commercially, with no attribution required.
That is a deliberate gate, not an accident of what we happened to have. A dataset only ships if its license is proven to permit redistribution; anything unproven is refused by default. Share-alike (ODbL) and proprietary identifier sets are out of scope for the open commons. See Data & provenance for the full policy and per-dataset provenance.
Next steps
Start here
Two minutes to your first query, then take it anywhere.