MeridianMERIDIAN

Validate a Data Delivery

Use FineType's profile and validate commands to build a repeatable quality gate for incoming data.

Goal: Save a schema from a known-good data batch, then validate every subsequent delivery against it — a repeatable quality gate you can run manually or in CI.

Prerequisites

ToolPurpose
FineTypeSchema generation and validation
A known-good CSV fileThe "golden" batch that defines the expected shape
New CSV deliveriesIncoming data to validate against the schema

The problem

You receive data from a partner or upstream system. The first batch looks fine — you build a pipeline around it. Then batch 17 arrives with a new column, dates in a different format, and nulls where there shouldn't be any. Your pipeline breaks at 2 AM.

FineType's validate workflow catches these issues at the gate, before the data enters your pipeline.

Steps

1. Profile and save a schema from the known-good batch

Start with a batch you trust. Run profile with -o json-schema to generate a JSON Schema that captures the expected structure:

finetype profile -f good-batch.csv -o json-schema > delivery-schema.json

The schema captures:

  • Column names and their expected order
  • Semantic type for each column (e.g., datetime.date.iso, identity.person.email)
  • Nullability — which columns had null values in the good batch
  • Value constraints derived from the detected type

Commit delivery-schema.json to your repository. This is your contract.

2. Validate a new delivery

When a new batch arrives, run validate to check it against the saved schema. Check-only mode reports the counts and sets an exit code, writing nothing:

finetype validate new-batch.csv delivery-schema.json
Validation Report
════════════════════════════════════════════════════════════
  Input:        new-batch.csv
  Schema:       delivery-schema.json
  Mode:         check-only (no .db written)

  Total rows:         10000
  Valid rows:          9847
  Invalid rows:         153
  Rejects:              153
  Grade:             B
════════════════════════════════════════════════════════════

The report tallies the rows and assigns a letter Grade (A ≥ 95% down to F) from the mean per-column quality. Exit codes:

  • 0 — all rows valid, delivery passes
  • 1 — one or more rows rejected, delivery fails
  • 2 — error

3. Capture and inspect the rejects

To keep the rejected rows for review, pass --db/--table. FineType materialises the valid rows into a typed table and records every reject in a finetype_reject_errors sidecar table — in one pass:

finetype validate new-batch.csv delivery-schema.json --db delivery.db --table new_batch

Inspect the rejects with DuckDB — each row tells you the column, the failure type, and the expected type:

duckdb delivery.db -c "SELECT column_name, error_type, constraint_failed, expected_type FROM finetype_reject_errors LIMIT 3;"
┌────────────────┬────────────────┬──────────────────┬───────────────────────────────────┐
│  column_name   │   error_type   │ constraint_failed│           expected_type           │
├────────────────┼────────────────┼──────────────────┼───────────────────────────────────┤
│ order_date     │ SEMANTIC_TYPE  │ pattern          │ datetime.date.iso                 │
│ amount         │ SEMANTIC_TYPE  │ type             │ representation.numeric.decimal_number │
│ customer_email │ SEMANTIC_TYPE  │ pattern          │ identity.person.email             │
└────────────────┴────────────────┴──────────────────┴───────────────────────────────────┘

No guesswork — the sidecar tells you exactly which column failed and why.

4. Gate a CI pipeline on the exit code

In CI you just need a pass/fail signal. Check-only validate already gives you one via its exit code — use it directly:

finetype validate new-batch.csv delivery-schema.json \
  && echo "Quality gate passed — loading data" \
  || echo "Quality gate failed — check rejects"

Use --lenient if you want a non-zero reject count to still exit 0 (errors still exit 2).

5. Fix and re-validate

When validation fails, the workflow is:

  1. Inspect the finetype_reject_errors sidecar to understand the issues
  2. Fix the source data (or update the schema if the change is intentional)
  3. Re-run validation:
finetype validate fixed-batch.csv delivery-schema.json
Validation Report
════════════════════════════════════════════════════════════
  Input:        fixed-batch.csv
  Schema:       delivery-schema.json
  Mode:         check-only (no .db written)

  Total rows:         10000
  Valid rows:         10000
  Invalid rows:           0
  Rejects:                0
  Grade:             A
════════════════════════════════════════════════════════════

Once the delivery passes, materialise it with --db/--tablefinetype validate fixed-batch.csv delivery-schema.json --db warehouse.db --table deliveries.

6. Update the schema when requirements change

When a legitimate schema change occurs (new column, different format from an upgraded source system), update the contract:

finetype profile -f updated-batch.csv -o json-schema > delivery-schema.json

Commit the updated schema. All future validations will use the new contract.

What you learned

  • finetype profile -o json-schema captures the expected structure of a CSV as a JSON Schema file
  • finetype validate checks incoming data against a saved schema; check-only sets an exit code, --db/--table materialises a typed table plus a finetype_reject_errors sidecar
  • The exit code (0 pass / 1 rejects / 2 error) drives CI integration; --lenient forces 0 on rejects
  • The schema file is your contract: commit it, version it, and update it intentionally

See also

On this page