Great Expectations the framework (now branded GX) has always been open source and self-hosted — you manage the project directory, the expectation suite files, the result stores, the Data Docs infrastructure. For teams that want to own that infrastructure, that's fine. For teams that want data quality enforcement without managing a GE project alongside every pipeline, GX Cloud is the answer.
I've been evaluating it over the last few months. Here's what it changes and what it doesn't.
What GX Cloud Provides
GX Cloud is a managed backend for your GE configuration. Instead of a great_expectations/ directory in your repo, your configuration lives in GX Cloud's hosted service:
- Hosted expectation suite storage: suites are defined and versioned in the cloud UI, not JSON files in your repo
- Validation result history: all run results stored and queryable — no DIY result store to maintain
- Data Docs equivalent: a web UI showing suite definitions, validation history, pass/fail trends over time
- Alerting: configure notifications when validation fails, without building the integration yourself
- Team collaboration: multiple users can view and edit suites through the UI
The Agent Model
GX Cloud doesn't run your validation jobs for you — you still need compute (your Databricks cluster, your Python environment) to execute the validation. What changes is where the configuration and results live.
The GX Agent runs in your environment and communicates with GX Cloud to fetch suite definitions and post results:
import great_expectations as gx
# With GX Cloud, get_context() returns a CloudDataContext
# configured via environment variables
import os
os.environ["GX_CLOUD_ACCESS_TOKEN"] = "your_token"
os.environ["GX_CLOUD_ORGANIZATION_ID"] = "your_org_id"
context = gx.get_context(mode="cloud")
# The rest of the API is the same
validator = context.get_validator(
batch_request=...,
expectation_suite_name="storm_events_silver" # fetched from cloud
)
validator.expect_column_values_to_not_be_null("event_id")
result = validator.validate()
# Result is posted to GX Cloud automatically
The Trade-offs
What you gain: Zero infrastructure to manage. Validation history out of the box. A UI that non-engineers can read. Alerting without custom integration work. Collaboration without Git-as-the-UI.
What you give up: Suites in source control (they live in the cloud UI, not your repo — though you can export them). Local reproducibility without cloud access. The ability to run fully offline.
The pricing question: GX Cloud has a paid tier for teams. The open-source self-hosted path remains available. For a solo practitioner or small team already managing infrastructure, the self-hosted path is still reasonable. For a larger team where the data quality infrastructure was becoming a maintenance burden, the managed path is worth evaluating.
My Current Assessment
GX Cloud makes sense for teams where the data quality infrastructure was the bottleneck — where getting engineers to adopt GE was hard because of setup complexity, or where maintaining result stores and Data Docs was overhead nobody wanted to own. If you're already running a healthy self-hosted GE setup, the migration cost needs to outweigh the maintenance savings before the move makes sense. Evaluate based on your actual friction points, not on the feature list. As always, I'm here to help.