Less Typing, More Discovery

By Hugh Cameron

TL;DR: FineType profiles your CSV and loads it into DuckDB with the right types — no manual casts, no format strings. One command to understand your columns, one command to query them.

The many CSVs on your computer hold many mysteries. Often, though, you're faced with solving the mundane before you unlock the magic.

Is that column of numbers a latitude or a postal code that's lost its leading zero? Is that text field a date, an address, or a free-form note? Is 4.6 a magnitude or a price? You open the file, squint at the first twenty rows, and start guessing. Sometimes those guesses hold. Sometimes they don't — and you find out three hours later when a join produces nonsense or a chart axis makes no sense.

That's three hours spent typing — casts, format strings, fixups — instead of discovering what the data actually has to say. We think the ratio should be the other way around.

Less typing

The tools available to analysts are changing fast. DuckDB has shown that serious analytical power can run on your laptop. The barrier to doing real analysis is lower than it's ever been.

But as the barrier drops, the foundations matter more, not less. When anyone can write a query, the question shifts from "can I write this?" to "can I trust this?" And trust starts with the most basic question of all: what type is this data, really?

This is what FineType does. It reads your data and infers the type of every column — not just VARCHAR or DOUBLE, but the semantic type: is that number a latitude, a magnitude, or a depth measurement? Is that string a timestamp, an address, or a category code?

Let's see what that looks like.

Profiling a real dataset

The USGS publishes a catalogue of every significant earthquake recorded worldwide. It's freely available, well-structured, and exactly the kind of data an analyst might pick up on any given morning. Let's download a year's worth:

curl -s "https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv\
&starttime=2024-01-01&endtime=2024-12-31&minmagnitude=4\
&limit=20000&orderby=time" -o earthquakes_2024.csv

That gives us 14,132 earthquakes across 22 columns. Now let's profile it:

finetype profile -f earthquakes_2024.csv
FineType Column Profile — "earthquakes_2024.csv" (14132 rows, 22 columns)
════════════════════════════════════════════════════════════════════════════════

  COLUMN                    TYPE                                      BROAD   CONF
  ──────────────────────────────────────────────────────────────────────────────
  time                      datetime.timestamp.iso_8601_milliseconds TIMESTAMP 100.0%
  latitude                  geography.coordinate.latitude            DOUBLE  90.0%
  longitude                 geography.coordinate.longitude           DOUBLE  50.0%
  depth                     representation.numeric.decimal_number    DOUBLE  54.0%
  mag                       representation.numeric.decimal_number    DOUBLE  93.0%
  magType                   representation.discrete.ordinal         VARCHAR  88.0%
  nst                       representation.numeric.integer_number    BIGINT  99.0%
  gap                       representation.numeric.integer_number    BIGINT  99.0%
  dmin                      representation.numeric.decimal_number    DOUBLE  89.0%
  rms                       representation.numeric.decimal_number    DOUBLE  99.0%
  net                       technology.code.locale_code             VARCHAR  99.0%
  id                        geography.coordinate.geohash            VARCHAR  74.0%
  updated                   datetime.timestamp.iso_8601_milliseconds TIMESTAMP 100.0%
  place                     geography.address.full_address          VARCHAR  43.0%
  type                      representation.discrete.ordinal         VARCHAR 100.0%
  horizontalError           representation.numeric.decimal_number    DOUBLE  98.0%
  depthError                representation.numeric.decimal_number    DOUBLE  92.0%
  magError                  representation.numeric.decimal_number    DOUBLE  87.0%
  magNst                    representation.numeric.integer_number    BIGINT  80.0%
  status                    representation.discrete.ordinal         VARCHAR 100.0%
  locationSource            technology.code.locale_code             VARCHAR  99.0%
  magSource                 technology.code.locale_code             VARCHAR  99.0%

In a few seconds, without writing a single query, you know something meaningful about every column.

The time and updated columns aren't just strings — they're ISO 8601 timestamps with millisecond precision, and FineType is 100% confident. The latitude and longitude columns aren't just decimals — they're geographic coordinates, distinguished from the other numeric columns like depth and mag. The place column ("80 km NW of Kandrian, Papua New Guinea") is recognised as an address, not arbitrary text. And magType, type, and status are correctly identified as ordinal categories rather than free-form strings.

None of this required domain knowledge. You didn't need to know what nst means or what magnitude scales exist. The profile gives you a map of the terrain before you start exploring — and that map is built from the data itself, not from assumptions.

More discovery

A profile is useful on its own, but the real payoff is what comes next. FineType can take what it's learned about your columns and load the data directly into DuckDB — with the right types already applied:

finetype load -f earthquakes_2024.csv | duckdb
┌─────────────────────────┬──────────┬───────────┬─────────┬────────┬─────────┬───────┬───┬──────────┬────────┬──────────┬────────────────┬───────────┐
│          time           │ latitude │ longitude │  depth  │  mag   │ magtype │  nst  │ … │ magerror │ magnst │  status  │ locationsource │ magsource │
│        timestamp        │  double  │  double   │ double  │ double │ varchar │ int64 │   │  double  │ int64  │ varchar  │    varchar     │  varchar  │
├─────────────────────────┼──────────┼───────────┼─────────┼────────┼─────────┼───────┼───┼──────────┼────────┼──────────┼────────────────┼───────────┤
│ 2024-12-30 23:56:29.977 │  -5.7603 │  148.9729 │ 127.013 │    4.6 │ mb      │    47 │ … │    0.089 │     38 │ reviewed │ us             │ us        │
│ 2024-12-30 23:40:33.868 │ -17.6089 │ -178.3937 │ 573.817 │    4.6 │ mb      │    69 │ … │    0.087 │     39 │ reviewed │ us             │ us        │
│ 2024-12-30 23:37:34.358 │ -31.5828 │ -179.7246 │ 201.998 │    4.0 │ mb      │    12 │ … │    0.173 │      9 │ reviewed │ us             │ us        │
│ …                       │      …   │       …   │     …   │    …   │ …       │   …   │ … │      …   │    …   │ …        │ …              │ …         │
└─────────────────────────┴──────────┴───────────┴─────────┴────────┴─────────┴───────┴───┴──────────┴────────┴──────────┴────────────────┴───────────┘

That's it. One command. No CREATE TABLE, no CAST(time AS TIMESTAMP), no strptime format strings. FineType writes the SQL so you don't have to — every column cast to the right DuckDB type, 14,132 rows loaded and ready to query.

The time column is a proper TIMESTAMP, not a string you'll need to parse later. The numeric columns are DOUBLE or INT64 as appropriate. The text columns stay as VARCHAR. You went from a flat file to a typed, queryable table without writing a line of SQL.

That's what we mean by less typing, more discovery. Less time wrestling with format strings and cast expressions. More time asking the questions you actually came to answer.

Not every semantic label is perfect yet — FineType identifies the id column (us6000pgkh) as a geohash at 74% confidence, when it's actually a USGS event identifier. But the broad type is right (it stays as VARCHAR, not cast to something that would fail), and that's the point: the casts succeed across all 22 columns, even where the finer-grained label has room to improve.

Try it yourself

Pick a CSV — one of yours, or grab the earthquake data above — and try the pipeline:

curl -fsSL https://install.meridian.online/finetype | bash
finetype profile -f your_data.csv
finetype load -f your_data.csv | duckdb

You might be surprised what you learn. The Quick Start guide walks through installation and your first profile in a few minutes.

This is the first in a series of posts about building analysis on solid ground. Next, we'll look at the 250 semantic types FineType can detect — and why VARCHAR is not a type, it's a surrender.