One of the complaints I hear consistently from teams that evaluated Great Expectations a few years ago and haven't revisited it: "the API keeps changing." It's a fair criticism. GE has gone through significant interface changes since the early pandas-wrapper days — the DataContext model, the Batch/BatchRequest pattern, Checkpoints replacing Validation Operators, the shift from implicit to explicit configuration.
The changes have generally been improvements, but they've required real migration work. Here's where things stand and how to navigate it.
The Major Inflection Points
Early versions (pre-DataContext): GE was essentially a pandas wrapper. You called ge.from_pandas(df) and ran expectations directly on the result. Simple, but no project structure, no suite persistence, no integration surface for CI.
The DataContext era: GE introduced a project model — a great_expectations/ directory with great_expectations.yml, datasource configs, expectation suite storage, and validation result storage. This made GE a proper project-level artifact rather than a script-level library. The migration from script-style to DataContext-style was the biggest breaking change for early adopters.
The Batch/BatchRequest model: How you tell GE "this is the data to validate" went through several iterations. The current stable model uses BatchRequest (for configured connectors) and RuntimeBatchRequest (for passing data at runtime). If you're on an older version using batch_kwargs, migrate — that API is deprecated.
Checkpoints replacing Validation Operators: If your code calls run_validation_operator, replace it with run_checkpoint. Checkpoints are more configurable and the long-term path.
The Current Recommended Pattern
import great_expectations as ge
# Modern entry point
context = ge.get_context()
# Define expectations against a validator
validator = context.get_validator(
batch_request=RuntimeBatchRequest(
datasource_name="my_datasource",
data_connector_name="runtime_connector",
data_asset_name="my_asset",
runtime_parameters={"batch_data": df},
batch_identifiers={"run_id": "run_001"}
),
expectation_suite_name="my_suite"
)
validator.expect_column_values_to_not_be_null("event_id")
validator.expect_column_values_to_be_between("magnitude", 0, 12)
validator.save_expectation_suite()
# Run via Checkpoint
result = context.run_checkpoint(
checkpoint_name="my_checkpoint",
validations=[{
"batch_request": batch_request,
"expectation_suite_name": "my_suite"
}]
)
print(result.success)
Pinning Your Version
If you're running GE in production, pin the version. GE follows semantic versioning but has been active enough in breaking changes that a minor version bump occasionally requires migration work. Pin in your requirements.txt and treat upgrades as deliberate decisions:
great_expectations==0.16.14 # pinned — upgrade requires migration review
When you do upgrade, read the changelog. GE documents migration guides for breaking changes. The guides are thorough — the migration from validation operators to checkpoints had a step-by-step guide that covered 95% of cases.
The Stability Trajectory
The good news: the interface has been stabilizing. The DataContext + BatchRequest + Checkpoint model is clearly the long-term path, and the team has been investing in making it more stable rather than continuing to iterate on the architecture. The turbulent period is behind it. Teams that evaluated GE in 2019 and found the API confusing should give it another look — the current version is more coherent than it was. As always, I'm here to help.