Great Expectations vs. tSQLt: Two Frameworks, One Principle

I've been writing about tSQLt since 2012 and Great Expectations since 2017. I've used both across different clients and projects, and I get asked regularly: which one should I use? The answer is almost always: both, depending on where in your stack you are.

This post is the explicit comparison I've been building toward for three years.

The Same Principle

Both frameworks exist to answer the same question: does my data behave the way I expect it to? Both use an assertion model — you define a claim about your data, the framework checks it, and you get a structured pass/fail result. Both integrate with CI pipelines. Both produce output that tells you exactly which assertion failed and why.

The principle is identical. The implementation is different because the runtime is different.

What's Actually Different

DimensiontSQLtGreat Expectations
LanguageT-SQLPython
RuntimeSQL Server (CLR)Python process; pushes to DB via SQLAlchemy
What it testsStored procedure / trigger / view logicData in any datasource (files, DataFrames, SQL tables)
Setup costCLR enabled, TRUSTWORTHY ON, install assemblypip install, configure datasource
Test isolationTransaction rollback (automatic)No built-in isolation — you manage test data
Suite formatSQL stored procedures in schemasJSON files
CI integrationsqlcmd + JUnit XMLPython script + exit code
Failure outputTest runner results tableValidationResult JSON + Data Docs HTML

Where Each One Belongs

Use tSQLt when the logic lives in SQL Server: stored procedures with branching logic, triggers, functions. If you're asking "does my GetPendingOrders stored procedure correctly filter by status?" — that's tSQLt territory. The logic is in the database; the test should be too.

Use Great Expectations when you're asserting properties of data at rest or in motion, regardless of where it lives: a CSV file arriving from a vendor, the output of a Python transformation, the contents of a SQL table after a load. If you're asking "does this data look like what I expected?" — that's GE territory. The question is about the data, not about the logic that produced it.

The practical overlap in a SQL Server data engineering shop:

# Great Expectations: assert the incoming CSV matches the contract
validate_raw_file('storm_events_2020_q3.csv', 'suites/storm_raw.json')

# Transform the data (Python)
clean_df = transform_storm_data(raw_df)

# Great Expectations: assert the transformation output is valid
validate_output(clean_df, 'suites/storm_clean.json')

# Load to SQL Server (via SSIS or sqlalchemy)
load_to_sql_server(clean_df, 'dbo.StormEvents')

# tSQLt: assert the stored procedures that process the loaded data work correctly
# (runs in CI, validates dbo.GetStormSummary, dbo.CalculateRisk, etc.)

The Shared Lesson

The reason I kept writing about tSQLt after 2012, and started writing about Great Expectations in 2017, isn't that these specific tools are irreplaceable. It's that the practice of encoding data contracts as executable assertions — and running those assertions automatically — is genuinely different from hoping data is correct. The tool is secondary. The habit is primary.

Whether you're asserting in T-SQL inside a transaction rollback, or in Python against a JSON expectation suite, you're doing the same thing: making your expectations about your data explicit and verifiable rather than implicit and discovered when production breaks.

Pick the right tool for the layer you're working in. Use both if your stack spans both. As always, I'm here to help.

Read more