Less Typing, More Discovery
By Hugh Cameron
TL;DR: FineType profiles your CSV and loads it into DuckDB with the right types — no manual casts, no format strings. One command to understand your columns, one command to query them.
The many CSVs on your computer hold many mysteries. Often, though, you're faced with solving the mundane before you unlock the magic.
Is that column of numbers a latitude or a postal code that's lost its leading
zero? Is that text field a date, an address, or a free-form note? Is 4.6 a
magnitude or a price? You open the file, squint at the first twenty rows, and
start guessing. Sometimes those guesses hold. Sometimes they don't — and you find
out three hours later when a join produces nonsense or a chart axis makes no
sense.
That's three hours spent typing — casts, format strings, fixups — instead of discovering what the data actually has to say. We think the ratio should be the other way around.
Less typing
The tools available to analysts are changing fast. DuckDB has shown that serious analytical power can run on your laptop. The barrier to doing real analysis is lower than it's ever been.
But as the barrier drops, the foundations matter more, not less. When anyone can write a query, the question shifts from "can I write this?" to "can I trust this?" And trust starts with the most basic question of all: what type is this data, really?
This is what FineType does. It reads your data and infers the type of every
column — not just VARCHAR or DOUBLE, but the semantic type: is that number a
latitude, a magnitude, or a depth measurement? Is that string a timestamp, an
address, or a category code?
Let's see what that looks like.
Profiling a real dataset
The USGS publishes a catalogue of every significant earthquake recorded worldwide. It's freely available, well-structured, and exactly the kind of data an analyst might pick up on any given morning. Let's download a year's worth:
curl -s "https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv\
&starttime=2024-01-01&endtime=2024-12-31&minmagnitude=4\
&limit=20000&orderby=time" -o earthquakes_2024.csvThat gives us 14,132 earthquakes across 22 columns. Now let's profile it:
finetype profile -f earthquakes_2024.csvFineType Column Profile — "earthquakes_2024.csv" (14132 rows, 22 columns)
════════════════════════════════════════════════════════════════════════════════
COLUMN TYPE BROAD CONF
──────────────────────────────────────────────────────────────────────────────
time datetime.timestamp.iso_8601_milliseconds TIMESTAMP 100.0%
latitude geography.coordinate.latitude DOUBLE 90.0%
longitude geography.coordinate.longitude DOUBLE 50.0%
depth representation.numeric.decimal_number DOUBLE 54.0%
mag representation.numeric.decimal_number DOUBLE 93.0%
magType representation.discrete.ordinal VARCHAR 88.0%
nst representation.numeric.integer_number BIGINT 99.0%
gap representation.numeric.integer_number BIGINT 99.0%
dmin representation.numeric.decimal_number DOUBLE 89.0%
rms representation.numeric.decimal_number DOUBLE 99.0%
net technology.code.locale_code VARCHAR 99.0%
id geography.coordinate.geohash VARCHAR 74.0%
updated datetime.timestamp.iso_8601_milliseconds TIMESTAMP 100.0%
place geography.address.full_address VARCHAR 43.0%
type representation.discrete.ordinal VARCHAR 100.0%
horizontalError representation.numeric.decimal_number DOUBLE 98.0%
depthError representation.numeric.decimal_number DOUBLE 92.0%
magError representation.numeric.decimal_number DOUBLE 87.0%
magNst representation.numeric.integer_number BIGINT 80.0%
status representation.discrete.ordinal VARCHAR 100.0%
locationSource technology.code.locale_code VARCHAR 99.0%
magSource technology.code.locale_code VARCHAR 99.0%In a few seconds, without writing a single query, you know something meaningful about every column.
The time and updated columns aren't just strings — they're ISO 8601
timestamps with millisecond precision, and FineType is 100% confident. The
latitude and longitude columns aren't just decimals — they're geographic
coordinates, distinguished from the other numeric columns like depth and mag.
The place column ("80 km NW of Kandrian, Papua New Guinea") is recognised as
an address, not arbitrary text. And magType, type, and status are correctly
identified as ordinal categories rather than free-form strings.
None of this required domain knowledge. You didn't need to know what nst means
or what magnitude scales exist. The profile gives you a map of the terrain before
you start exploring — and that map is built from the data itself, not from
assumptions.
More discovery
A profile is useful on its own, but the real payoff is what comes next. FineType can take what it's learned about your columns and load the data directly into DuckDB — with the right types already applied:
finetype load -f earthquakes_2024.csv | duckdb┌─────────────────────────┬──────────┬───────────┬─────────┬────────┬─────────┬───────┬───┬──────────┬────────┬──────────┬────────────────┬───────────┐
│ time │ latitude │ longitude │ depth │ mag │ magtype │ nst │ … │ magerror │ magnst │ status │ locationsource │ magsource │
│ timestamp │ double │ double │ double │ double │ varchar │ int64 │ │ double │ int64 │ varchar │ varchar │ varchar │
├─────────────────────────┼──────────┼───────────┼─────────┼────────┼─────────┼───────┼───┼──────────┼────────┼──────────┼────────────────┼───────────┤
│ 2024-12-30 23:56:29.977 │ -5.7603 │ 148.9729 │ 127.013 │ 4.6 │ mb │ 47 │ … │ 0.089 │ 38 │ reviewed │ us │ us │
│ 2024-12-30 23:40:33.868 │ -17.6089 │ -178.3937 │ 573.817 │ 4.6 │ mb │ 69 │ … │ 0.087 │ 39 │ reviewed │ us │ us │
│ 2024-12-30 23:37:34.358 │ -31.5828 │ -179.7246 │ 201.998 │ 4.0 │ mb │ 12 │ … │ 0.173 │ 9 │ reviewed │ us │ us │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────────────────┴──────────┴───────────┴─────────┴────────┴─────────┴───────┴───┴──────────┴────────┴──────────┴────────────────┴───────────┘That's it. One command. No CREATE TABLE, no CAST(time AS TIMESTAMP), no
strptime format strings. FineType writes the SQL so you don't have to — every
column cast to the right DuckDB type, 14,132 rows loaded and ready to query.
The time column is a proper TIMESTAMP, not a string you'll need to parse
later. The numeric columns are DOUBLE or INT64 as appropriate. The text
columns stay as VARCHAR. You went from a flat file to a typed, queryable table
without writing a line of SQL.
That's what we mean by less typing, more discovery. Less time wrestling with format strings and cast expressions. More time asking the questions you actually came to answer.
Not every semantic label is perfect yet — FineType identifies the id column
(us6000pgkh) as a geohash at 74% confidence, when it's actually a USGS event
identifier. But the broad type is right (it stays as VARCHAR, not cast to
something that would fail), and that's the point: the casts succeed across all 22
columns, even where the finer-grained label has room to improve.
Try it yourself
Pick a CSV — one of yours, or grab the earthquake data above — and try the pipeline:
curl -fsSL https://install.meridian.online/finetype | bash
finetype profile -f your_data.csv
finetype load -f your_data.csv | duckdbYou might be surprised what you learn. The Quick Start guide walks through installation and your first profile in a few minutes.
This is the first in a series of posts about building analysis on solid ground.
Next, we'll look at the 250 semantic types FineType can detect — and why
VARCHAR is not a type, it's a surrender.