Use Anywhere
One ATTACH line exposes the whole catalog to DuckDB, Python, and R — the same file the browser reads.
The browser explorer is a convenience, not a walled garden. Every dataset is a plain Parquet file at a stable URL, and the whole catalog is a frozen DuckLake — so you can query it from a notebook, a script, or a scheduled job with one line.
One line to the whole catalog
ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);That attaches the open catalog read-only and exposes every dataset as a table:
| Table | What it is |
|---|---|
meridian.gleif | GLEIF legal entities (LEI register) |
meridian.edgar | SEC EDGAR company tickers |
meridian.naics | NAICS industry classification |
The catalog is a single ~5 MB file that points at the underlying Parquet by reference — attaching it copies no data. Your engine reads only the columns and row groups each query needs, straight from R2.
The snippets
These are the exact snippets the explorer's Use elsewhere drawer emits — the query shown is the GLEIF country breakdown from the quick start, but any query you build in the grid comes out the same shape.
-- One ATTACH exposes every Meridian open dataset as a table.
INSTALL ducklake;
ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20;
-- Or read just this one Parquet directly:
INSTALL httpfs;
SELECT country, count(*) AS entities
FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
GROUP BY country
ORDER BY entities DESC
LIMIT 20;import duckdb
con = duckdb.connect()
con.execute("INSTALL ducklake; INSTALL httpfs;")
con.execute("ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY)")
df = con.sql("""
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20
""").df()
# Or read just this one Parquet directly:
# duckdb.sql("""
# SELECT country, count(*) AS entities
# FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
# GROUP BY country
# ORDER BY entities DESC
# LIMIT 20
# """).df()Requires the duckdb package: pip install duckdb.
library(duckdb)
con <- dbConnect(duckdb())
dbExecute(con, "INSTALL ducklake; INSTALL httpfs;")
dbExecute(con, "ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY)")
dbGetQuery(con, "
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20
")
# Or read just this one Parquet directly:
# dbGetQuery(con, "
# SELECT country, count(*) AS entities
# FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
# GROUP BY country
# ORDER BY entities DESC
# LIMIT 20
# ")Requires the duckdb package: install.packages("duckdb").
# DuckDB CLI — the whole commons via one ATTACH
duckdb -c "
INSTALL ducklake;
ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);
SELECT country, count(*) AS entities FROM meridian.gleif GROUP BY country ORDER BY entities DESC LIMIT 20;"
# Or read just this one Parquet directly:
# duckdb -c "INSTALL httpfs; SELECT country, count(*) AS entities FROM read_parquet('https://openlake.meridian.online/gleif.parquet') GROUP BY country ORDER BY entities DESC LIMIT 20;"Extension requirements
The engine needs two DuckDB extensions, both fetched on first use:
ducklake— resolves theducklake:catalog. Install withINSTALL ducklake;. It pulls in HTTP support automatically, so theATTACHline above works on its own.httpfs— reads a Parquet file directly over HTTPS. Install withINSTALL httpfs;. You only need it for theread_parquet('https://…')fallback path.
Both live in the default extension repositories; a recent DuckDB (1.x) installs them without any extra configuration. INSTALL runs once and caches the extension locally.
ATTACH vs. read_parquet directly
Two ways in, and they answer different questions:
ATTACHthe catalog when you want the whole commons behind one connection — stable table names (meridian.gleif, …), joins across datasets, and no per-file URLs to track.read_parquet('https://openlake.meridian.online/<slug>.parquet')when you want exactly one dataset and nothing else. The<slug>.parquetURL is stable and always resolves to the current release.
Reproducible reads
The stable <slug>.parquet URL follows the latest release. If you need a fixed snapshot that never changes under you, use the content-addressed URL from the manifest instead — see Data & provenance.
Just download the file
No SQL engine handy? The Parquet is an ordinary object — curl it:
# Resolve the current immutable URL from the manifest, then download it:
curl -s https://openlake.meridian.online/manifest.json | jq -r '.datasets.gleif.url'
# Or grab the stable pointer directly (follows the latest release):
curl -O https://openlake.meridian.online/gleif.parquetA note on the corpus column
Each table carries a hidden corpus column — a normalized blend of the text columns that powers the explorer's search box. SELECT * will include it (and it's large). When you want the clean, human columns, list them explicitly or exclude it:
SELECT * EXCLUDE (corpus) FROM meridian.gleif LIMIT 50;