Validate a Data Delivery
Use FineType's profile and validate commands to build a repeatable quality gate for incoming data.
Goal: Save a schema from a known-good data batch, then validate every subsequent delivery against it — a repeatable quality gate you can run manually or in CI.
Prerequisites
| Tool | Purpose |
|---|---|
| FineType | Schema generation and validation |
| A known-good CSV file | The "golden" batch that defines the expected shape |
| New CSV deliveries | Incoming data to validate against the schema |
The problem
You receive data from a partner or upstream system. The first batch looks fine — you build a pipeline around it. Then batch 17 arrives with a new column, dates in a different format, and nulls where there shouldn't be any. Your pipeline breaks at 2 AM.
FineType's validate workflow catches these issues at the gate, before the data enters your pipeline.
Steps
1. Profile and save a schema from the known-good batch
Start with a batch you trust. Run profile with -o json-schema to generate a JSON Schema that captures the expected structure:
finetype profile -f good-batch.csv -o json-schema > delivery-schema.jsonThe schema captures:
- Column names and their expected order
- Semantic type for each column (e.g.,
datetime.date.iso,identity.person.email) - Nullability — which columns had null values in the good batch
- Value constraints derived from the detected type
Commit delivery-schema.json to your repository. This is your contract.
2. Validate a new delivery
When a new batch arrives, run validate to check it against the saved schema. Check-only mode reports the counts and sets an exit code, writing nothing:
finetype validate new-batch.csv delivery-schema.jsonValidation Report
════════════════════════════════════════════════════════════
Input: new-batch.csv
Schema: delivery-schema.json
Mode: check-only (no .db written)
Total rows: 10000
Valid rows: 9847
Invalid rows: 153
Rejects: 153
Grade: B
════════════════════════════════════════════════════════════The report tallies the rows and assigns a letter Grade (A ≥ 95% down to F) from the mean per-column quality. Exit codes:
- 0 — all rows valid, delivery passes
- 1 — one or more rows rejected, delivery fails
- 2 — error
3. Capture and inspect the rejects
To keep the rejected rows for review, pass --db/--table. FineType materialises the valid rows into a typed table and records every reject in a finetype_reject_errors sidecar table — in one pass:
finetype validate new-batch.csv delivery-schema.json --db delivery.db --table new_batchInspect the rejects with DuckDB — each row tells you the column, the failure type, and the expected type:
duckdb delivery.db -c "SELECT column_name, error_type, constraint_failed, expected_type FROM finetype_reject_errors LIMIT 3;"┌────────────────┬────────────────┬──────────────────┬───────────────────────────────────┐
│ column_name │ error_type │ constraint_failed│ expected_type │
├────────────────┼────────────────┼──────────────────┼───────────────────────────────────┤
│ order_date │ SEMANTIC_TYPE │ pattern │ datetime.date.iso │
│ amount │ SEMANTIC_TYPE │ type │ representation.numeric.decimal_number │
│ customer_email │ SEMANTIC_TYPE │ pattern │ identity.person.email │
└────────────────┴────────────────┴──────────────────┴───────────────────────────────────┘No guesswork — the sidecar tells you exactly which column failed and why.
4. Gate a CI pipeline on the exit code
In CI you just need a pass/fail signal. Check-only validate already gives you one via its exit code — use it directly:
finetype validate new-batch.csv delivery-schema.json \
&& echo "Quality gate passed — loading data" \
|| echo "Quality gate failed — check rejects"Use --lenient if you want a non-zero reject count to still exit 0 (errors still exit 2).
5. Fix and re-validate
When validation fails, the workflow is:
- Inspect the
finetype_reject_errorssidecar to understand the issues - Fix the source data (or update the schema if the change is intentional)
- Re-run validation:
finetype validate fixed-batch.csv delivery-schema.jsonValidation Report
════════════════════════════════════════════════════════════
Input: fixed-batch.csv
Schema: delivery-schema.json
Mode: check-only (no .db written)
Total rows: 10000
Valid rows: 10000
Invalid rows: 0
Rejects: 0
Grade: A
════════════════════════════════════════════════════════════Once the delivery passes, materialise it with --db/--table — finetype validate fixed-batch.csv delivery-schema.json --db warehouse.db --table deliveries.
6. Update the schema when requirements change
When a legitimate schema change occurs (new column, different format from an upgraded source system), update the contract:
finetype profile -f updated-batch.csv -o json-schema > delivery-schema.jsonCommit the updated schema. All future validations will use the new contract.
What you learned
finetype profile -o json-schemacaptures the expected structure of a CSV as a JSON Schema filefinetype validatechecks incoming data against a saved schema; check-only sets an exit code,--db/--tablematerialises a typed table plus afinetype_reject_errorssidecar- The exit code (0 pass / 1 rejects / 2 error) drives CI integration;
--lenientforces 0 on rejects - The schema file is your contract: commit it, version it, and update it intentionally
See also
profilecommand reference — schema export options and--statsvalidatecommand reference — flags, output formats, and exit codes- Build a Typed DuckDB Pipeline — validate and materialise data into DuckDB with proper types