Why CPU Threads Beat SQL Server Parallelism for This Workload
The benchmark numbers are in the previous post. This one explains the mechanism — specifically why a C# Parallel.ForEach beat SQL Server's MAXDOP parallelism for geocoding, and how to think about when that outcome should make you reach for a C# parallel program instead of a SQL query.
SQL Server Parallelism: How It Actually Works
SQL Server's parallel query execution uses a model called "intra-query parallelism." When the query optimizer determines that parallelism would help, it inserts Exchange operators into the query plan — Repartition Streams, Gather Streams, Distribute Streams. These operators are the coordination points where parallel worker threads divide up work and synchronize results.
For a SELECT or aggregate query that reads from a table and produces a result set, this model is efficient. The exchange operators distribute rows across threads based on a partition key (for joins) or hash (for aggregation), each thread processes its portion independently, and the gather operator assembles the final result. The overhead is modest because the query plan is designed to minimize synchronization.
For an UPDATE query that modifies rows in place, the model gets more complex. SQL Server has to: identify which rows need updating (potentially with parallel reads), acquire locks on those rows, perform the update, write the modifications to the transaction log (which is serial at the page level), and release the locks. The lock acquisition and transaction log serialization become bottlenecks as parallelism increases.
Where the C# Version Avoided That Overhead
The C# version didn't update rows in place. It read source rows, computed the result in memory, and bulk-inserted into a separate destination table. No row-level locking. No UPDATE log records for 400 million rows. The write-ahead logging for the bulk insert was minimal compared to 400 million individual row updates.
More importantly: the spatial computation in the C# version ran in pure user-mode code, in the calling thread's address space, against data structures directly addressable by the thread's memory pointers. SQL Server's spatial operations ran through the database engine's execution framework — query plan operators, buffer pool access, memory grant management — which added layers of abstraction that a direct memory access pattern doesn't have.
The Mental Model for When to Use Which
SQL Server's optimizer excels when:
- The computation involves joins, aggregations, or set operations over multiple tables
- Index seeks can dramatically reduce the amount of data processed
- The result set is much smaller than the input (high selectivity)
- The data is already in SQL Server and moving it out would cost more than the computation gain
A C# parallel program is worth considering when:
- The computation is CPU-bound and embarrassingly parallel (each row processed independently)
- The operation is a transformation or computation on each row, not a set operation across rows
- You need custom algorithms or third-party libraries that SQL Server can't access (a geographic library, a machine learning model, an external API call)
- The data volume is large enough that SQL Server's execution overhead becomes measurable
The Parallel.ForEach Pattern for Data Processing
// The pattern: read in batches, process each batch in parallel, write results
public static async Task ProcessInParallelAsync<TIn, TOut>(
IEnumerable<IEnumerable<TIn>> batches,
Func<TIn, TOut> transform,
Action<IList<TOut>> writeBatch,
int maxDegreeOfParallelism = -1) // -1 = use processor count
{
var options = new ParallelOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism > 0
? maxDegreeOfParallelism
: Environment.ProcessorCount
};
Parallel.ForEach(batches, options, batch =>
{
var results = batch.Select(transform).ToList();
writeBatch(results); // caller handles thread-safe write
});
}
// Usage: geocoding
ProcessInParallelAsync(
radarData.ReadBatches(10000),
row => new GeocodedRow(row.Id, countyBoundaries.FindFips(row.Lat, row.Lon)),
results => BulkInsert(results, stagingConnection),
maxDegreeOfParallelism: 8);The critical thread-safety concern: the writeBatch action must handle its own thread safety. If multiple threads write to the same database connection simultaneously, you'll get connection pool contention. The solution: each thread gets its own connection (the batch lambda captures its own SqlConnection), or writes are queued to a BlockingCollection and a single writer thread drains it.
The Lesson That Transferred
After the geocoding benchmark, I stopped assuming SQL Server was the right execution engine for every data transformation. For set-based operations: SQL Server. For row-by-row CPU-intensive computation: C# parallel code. For very large scale: ADLA/U-SQL. The right tool for the computation, not the familiar tool. As always, I'm here to help.