Expectation Suites as Code: Version-Controlling Your Data Contracts

The last few years of home lab work have reinforced a pattern I already believed in: anything that controls system behavior belongs in source control. Kubernetes manifests, Terraform configs, Ansible playbooks, SSDT database projects with tSQLt tests. The question isn't whether to version-control it — it's how.

Great Expectations expectation suites are the data quality equivalent of these configuration-as-code artifacts. They're JSON files. They're human-readable. They're diffable. They belong in the same Git repository as the pipeline code they protect.

What a Suite File Looks Like

When you save an expectation suite, GE writes a JSON file that's more readable than you might expect:

{
  "expectation_suite_name": "storm_events_raw",
  "expectations": [
    {
      "expectation_type": "expect_column_to_exist",
      "kwargs": {"column": "EVENT_ID"},
      "meta": {}
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {"column": "EVENT_ID", "mostly": 1.0},
      "meta": {"notes": "Primary key — must never be null"}
    },
    {
      "expectation_type": "expect_column_values_to_be_between",
      "kwargs": {
        "column": "MAGNITUDE",
        "min_value": 0,
        "max_value": 12.0,
        "mostly": 0.99
      },
      "meta": {}
    },
    {
      "expectation_type": "expect_table_row_count_to_be_between",
      "kwargs": {"min_value": 1000, "max_value": 500000},
      "meta": {"notes": "Annual storm event files range from 50k to 200k rows"}
    }
  ],
  "meta": {
    "great_expectations_version": "0.11.9",
    "created_by": "shannon@toyboxcreations.net"
  }
}

A pull request that changes max_value from 12.0 to 15.0 on the magnitude expectation is a one-line diff that triggers a code review conversation about whether the data contract changed intentionally.

The Repository Structure

The structure I've settled on keeps suites next to the pipeline code they protect:

storm_pipeline/
  great_expectations/
    expectations/
      storm_events_raw.json       # what NOAA delivers
      storm_events_processed.json # what our transforms produce
      storm_features.json         # input to model training
    great_expectations.yml
  src/
    ingest.py
    transform.py
    features.py
  tests/
    test_transforms.py            # unit tests for Python code
  Dockerfile

The expectation suites live in the same repo as the pipeline code. When you update a transform that changes the output schema, the suite update is part of the same commit — or better, the suite update is a failing check that reminds you to update the transform.

Using the meta Field for Documentation

Each expectation in a suite has a meta dictionary for arbitrary annotations. Use it for the "why" behind non-obvious expectations:

df.expect_column_values_to_be_between(
    'MAGNITUDE',
    min_value=0,
    max_value=12.0,
    mostly=0.99,
    meta={
        "notes": "Hail size in inches. Physical upper bound ~8", "
                 "0.99 mostly allows for 1% of NOAA transcription errors. "
                 "max_value=12 is deliberately generous — flag it if we see values above 8.",
        "last_reviewed": "2020-07-01",
        "reviewer": "shannon"
    }
)

This is the equivalent of a meaningful comment in code — explaining the why of the constraint, not just restating what the JSON already says. It survives in source control alongside the expectation itself.

Treating Suite Changes Like Schema Changes

A change to an expectation suite is a change to a data contract. It should go through the same review process as a schema change in SSDT or a stored procedure change in your SQL Server project. The commit message should explain what changed and why. The diff should be reviewable. The change should be traceable.

When something goes wrong with data quality, the Git history of your suite files tells you when the contract changed, who changed it, and why. That's the audit trail that makes data quality enforceable rather than aspirational. As always, I'm here to help.

Expectation Suites as Code: Version-Controlling Your Data Contracts

Shannon Lowder

What a Suite File Looks Like

The Repository Structure

Using the meta Field for Documentation

Treating Suite Changes Like Schema Changes

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving