Data Docs: Auto-Generated Data Quality Reports from Your Expectation Suite

One of the quieter but more useful features in Great Expectations is Data Docs — automatically generated HTML documentation that shows you, in a human-readable format, what your expectations are and whether your data passes them. It's not something you build; it's something GE generates from the artifacts you already have: your expectation suites and your validation results.

The value proposition: your data quality checks are already encoded in expectation suites. Data Docs makes those checks visible to people who don't want to read JSON files — data consumers, analysts, stakeholders who need to understand what "this data is validated" actually means.

Setting Up Data Docs

Data Docs are configured in your Great Expectations project's great_expectations.yml file. Once you have a DataContext set up:

import great_expectations as ge

# Initialize a GE project in the current directory
# (creates great_expectations/ folder with config)
context = ge.data_context.DataContext.create(".")

# After validation runs, build Data Docs
context.build_data_docs()

The output is a static HTML site in great_expectations/uncommitted/data_docs/. Open index.html and you get a site with three main sections:

  • Expectation Suites — each suite renders as a human-readable table of expectations: what column, what rule, what parameters
  • Validation Results — each validation run shows which expectations passed, which failed, sample failing values, and counts
  • Profiling Results — if you've run the data profiler, distribution charts and summary stats appear here

The Workflow With Data Docs

import great_expectations as ge

context = ge.data_context.DataContext(".")

# Run validation and save the result to the context
batch_kwargs = {
    "path": "storm_events_2019_q2.csv",
    "datasource": "local_files"
}
batch = context.get_batch(batch_kwargs, "storm_events_suite")
result = context.run_validation_operator(
    "action_list_operator",
    assets_to_validate=[batch]
)

# Rebuild Data Docs to include the new result
context.build_data_docs()

# Open in the browser (or CI, deploy to S3, etc.)
context.open_data_docs()

Using Data Docs as a Data Contract Artifact

The most practical use I've found: after each pipeline run, rebuild Data Docs and archive the HTML output as a build artifact. The validation results become a permanent, readable record of what the data looked like at that moment — what passed, what failed, how many rows violated each rule.

When a downstream consumer asks "was the Q2 data clean when it was loaded?" you have a browsable HTML report that answers the question without anyone having to interpret JSON or query a log table. That's the difference between data quality that's enforced and data quality that's auditable.

Sharing Data Docs With Non-Engineers

Data Docs generates plain HTML — no server required, no dependencies. You can:

  • Push the output to S3 and share a public link
  • Attach the generated folder as a CI build artifact
  • Commit a snapshot to a documentation repo
  • Email the validation summary report from the JSON result and link to the full HTML

An analyst who wants to understand what validations run against their data source gets an HTML page, not a Python file. That's the right interface for the right audience. As always, I'm here to help.

Read more