Step 3 · Validate and Monitor

Goal

Confirm the data is correct after migration, establish ongoing data quality monitoring, and close the loop on the data change.

Instructions

You are in workflow step 3 of the data-cycle. The migration has been applied. Your job is to verify correctness, catch any data quality issues, and set up signals to detect future drift.


Tasks to Perform

1. Post-Migration Validation

Run these checks immediately after applying the migration:

-- Row counts: do new tables have the expected number of rows?
SELECT COUNT(*) FROM customer_addresses;

-- Did backfills complete? (check for null values that should be populated)
SELECT COUNT(*) FROM orders WHERE customer_id IS NULL;
-- Expected: 0

-- Verify foreign key integrity (no orphaned rows)
SELECT COUNT(*) FROM order_items oi
LEFT JOIN orders o ON o.id = oi.order_id
WHERE o.id IS NULL;
-- Expected: 0

-- Verify index creation succeeded
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'customer_addresses';

-- Check for constraint violations
SELECT conname, contype
FROM pg_constraint
WHERE conrelid = 'customer_addresses'::regclass;

2. Application-Level Smoke Tests

Run these after the migration, against the connected application (not raw SQL):

# Health check
curl -sf https://yourapp.com/health | python3 -m json.tool

# Test the specific feature that touches the new data
# (use Postman, HTTPie, or your existing integration test suite)

# For a new table
curl -X POST https://yourapp.com/api/customers/123/addresses \
  -H "Content-Type: application/json" \
  -d '{"line_1": "1 Test St", "city": "London", "country": "GB", "postcode": "EC1A 1BB"}'

curl https://yourapp.com/api/customers/123/addresses

3. Check for Query Performance Regressions

After schema changes, slow queries sometimes appear immediately:

-- PostgreSQL: check for new slow query patterns
SELECT
  left(query, 100)    AS query,
  calls,
  round(mean_exec_time) AS avg_ms,
  round(total_exec_time / 1000) AS total_sec
FROM pg_stat_statements
WHERE mean_exec_time > 100            -- queries averaging > 100ms
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Check if new indexes are being used
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE tablename = 'customer_addresses'
ORDER BY idx_scan DESC;

If a query is unexpectedly slow, run EXPLAIN ANALYZE and check for missing indexes.

4. Set Up Data Quality Monitoring

For any critical table, set up ongoing assertion-based monitoring:

-- Example: create a monitoring view or schedule these as health check queries

-- Check: orders without a customer
SELECT 'orders_without_customer' AS check_name, COUNT(*) AS violations
FROM orders WHERE customer_id IS NULL;

-- Check: negative amounts (should never happen)
SELECT 'negative_order_amounts' AS check_name, COUNT(*) AS violations
FROM orders WHERE amount_cents < 0;

-- Check: records created today vs same day last week (anomaly detection)
SELECT
  DATE_TRUNC('day', created_at) AS day,
  COUNT(*) AS record_count
FROM orders
WHERE created_at > NOW() - INTERVAL '14 days'
GROUP BY 1
ORDER BY 1;

Add a monitoring task to the project:

## Data quality checks — run daily via cron or monitoring script

```bash
php artisan data:quality-check   # (create this command if it doesn't exist)
# or schedule the SQL assertions via your APM / alerting tool

5. Update Documentation

# Regenerate schema snapshot
pg_dump --schema-only $DATABASE_URL > docs/data-models/schema.sql

# Update the ERD / relationship description in docs/
# docs/data-models/[domain-name].md

Add the new model to the data catalog if the project maintains one.

6. Close the Cycle

Update TODO.md to mark all data tasks complete:

- [x] Model customer_addresses schema _(ref: workflows/data-cycle/01-model-and-design.md)_
- [x] Write and apply migration _(ref: workflows/data-cycle/02-migrate-and-apply.md)_
- [x] Validate data and monitor _(ref: workflows/data-cycle/03-validate-and-monitor.md)_

Exit Criteria

  • [ ] Post-migration SQL validations all pass (row counts, nulls, FK integrity)
  • [ ] Application smoke tests pass against production
  • [ ] No new slow queries introduced (checked via pg_stat_statements or APM)
  • [ ] Data quality assertions documented and scheduled
  • [ ] Schema documentation / ERD updated
  • [ ] All TODO items for this data change marked done

← Return to Step 1 · Model and Design for the next data change.