Writing Custom Expectations in Great Expectations

The built-in expectation library covers the common cases well: nulls, ranges, value sets, row counts, regex patterns. But real data has domain-specific rules that a general-purpose library can't anticipate. A storm event can't end before it starts. A geocoded latitude must correspond to a valid US state. A FIPS county code must exist in a reference table.

Great Expectations lets you write custom expectations for these domain rules, and they integrate cleanly with the rest of the framework — they serialize into your expectation suite, they produce structured results, and they run alongside the built-ins.

The Custom Expectation Interface

A custom expectation is a Python method you add to a class that subclasses ge.dataset.PandasDataset. The framework provides a decorator that handles the plumbing — result serialization, the mostly parameter, column-level metadata — so you just write the logic:

import great_expectations as ge
from great_expectations.dataset import PandasDataset

class StormDataset(PandasDataset):

    @PandasDataset.expectation(["column"])
    def expect_event_end_not_before_start(self, column_end, column_start):
        """Storm end time must be >= storm start time."""
        violations = self[self[column_end] < self[column_start]]
        return {
            "success": len(violations) == 0,
            "result": {
                "unexpected_count": len(violations),
                "unexpected_percent": len(violations) / len(self) * 100,
                "unexpected_values": violations[[column_start, column_end]].head(5).to_dict('records')
            }
        }

    @PandasDataset.expectation(["column"])
    def expect_column_values_to_be_valid_fips(self, column):
        """Values must be valid 5-digit US FIPS county codes."""
        import re
        fips_pattern = re.compile(r'^d{5}$')
        valid = self[column].dropna().apply(lambda v: bool(fips_pattern.match(str(v))))
        invalid_count = (~valid).sum()
        return {
            "success": invalid_count == 0,
            "result": {
                "unexpected_count": int(invalid_count),
                "unexpected_percent": invalid_count / len(self) * 100
            }
        }

Using Custom Expectations in a Suite

import pandas as pd

raw = pd.read_csv('storm_events_2018.csv', parse_dates=['BEGIN_DATE_TIME', 'END_DATE_TIME'])
df = StormDataset(raw)

# Built-in and custom expectations side by side
df.expect_column_values_to_not_be_null('BEGIN_DATE_TIME')
df.expect_column_values_to_not_be_null('END_DATE_TIME')
df.expect_event_end_not_before_start('END_DATE_TIME', 'BEGIN_DATE_TIME')
df.expect_column_values_to_be_valid_fips('CZ_FIPS')

result = df.validate()
print(result['success'])

Domain Logic Belongs in Expectations, Not Transformation Code

The pattern I've found most useful: any time I write a defensive check in transformation code — if start > end: raise ValueError — I ask whether that check belongs in a custom expectation instead. The answer is usually yes, if:

The rule applies to the source data, not to logic I'm computing
A violation should halt ingestion, not be silently handled
I want the rule to be explicitly documented and visible in the suite

Inline defensive checks are invisible. Custom expectations are documented, tested, and produce structured output when they fail. The first time a custom expectation fires and you can see exactly which rows violated the business rule — and how many — you'll stop putting domain logic in transformation code.

As with any framework extension: keep custom expectations focused on a single rule, name them clearly (expect_event_end_not_before_start is self-documenting), and test them like any other code. If your custom expectation has a bug, it'll pass when it should fail — which is worse than having no expectation at all. As always, I'm here to help.

Writing Custom Expectations in Great Expectations

Shannon Lowder

The Custom Expectation Interface

Using Custom Expectations in a Suite

Domain Logic Belongs in Expectations, Not Transformation Code

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving