The ADLA Job That Did Nothing for Six Hours: Understanding U-SQL Execution Plans
I need to tell you about the geocoding job. This is the story of a U-SQL job that ran all its vertices, consumed 200 Analytics Units for six hours, reported 100% completion, and produced a output file with zero rows. Not an error. Not a failure status. Success — and nothing in the output.
Understanding how this happened is one of the most useful things I can tell you about ADLA, because it reveals exactly how U-SQL execution works under the hood and why the monitoring story was inadequate in 2015.
The Job: Geocoding Storm Event Locations
The NOAA storm event data had county and state information but no lat/long coordinates. The goal was to geocode each storm event to a geographic point — merge the storm events with a county-level geocoding reference table to get coordinates for spatial analysis.
The U-SQL script looked straightforward:
@stormEvents =
EXTRACT EventId int, State string, CountyFips string, ...
FROM "/raw/noaa/storm-events/*.csv"
USING Extractors.Csv(skipFirstNRows: 1, quoting: true);
@geocodes =
EXTRACT CountyFips string, Latitude decimal, Longitude decimal
FROM "/raw/reference/county-geocodes.csv"
USING Extractors.Csv(skipFirstNRows: 1);
@geocoded =
SELECT e.EventId,
e.State,
g.Latitude,
g.Longitude
FROM @stormEvents AS e
JOIN @geocodes AS g
ON e.CountyFips == g.CountyFips;
OUTPUT @geocoded
TO "/curated/analytics/storm-events-geocoded.csv"
USING Outputters.Csv(outputHeader: true);Submitted. Ran for six hours. Output file: 0 bytes (just the header row).
The Actual Problem: JOIN Type Mismatch
The storm event data had CountyFips stored as a string. The geocode reference had CountyFips stored as a string. They looked like "37063" in the storm events and... also like "37063" in the geocodes? Run the job. Zero rows.
After hours of debugging, the issue: the storm events file had FIPS codes with leading zeros (01001). The geocode reference had them without leading zeros (1001). The join matched nothing. U-SQL's inner join produced zero rows, and since U-SQL's default join is inner, nothing was flagged as an error or warning. The job completed successfully because it did exactly what I told it to do.
The Monitoring Gap
This is the ADLA monitoring problem. A zero-row output on an inner join is a valid result — maybe every row legitimately had no match. ADLA had no way to know that was unexpected. The job graph showed all vertices green. Input bytes per vertex looked correct (the data was being read). Output bytes were zero — but zero is a valid output size if no rows match.
What was missing: row counts visible in the job monitoring UI. ADLA showed byte counts at each vertex but not row counts. A vertex that read 2GB of input and produced 0 bytes of output was invisible as anomalous until you already knew the output was wrong.
How to Catch It
Add explicit diagnostics to U-SQL scripts that might have join cardinality problems:
// Validate join cardinality before expensive downstream processing
@matchCount =
SELECT COUNT(*) AS Matched
FROM @stormEvents AS e
JOIN @geocodes AS g
ON e.CountyFips == g.CountyFips;
// Write diagnostic to a separate file — check this before proceeding
OUTPUT @matchCount
TO "/staging/diagnostics/geocode-match-count.txt"
USING Outputters.Tsv();
// Also sample unmatched to diagnose the mismatch
@unmatched =
SELECT TOP 100 e.EventId, e.State, e.CountyFips
FROM @stormEvents AS e
LEFT OUTER JOIN @geocodes AS g
ON e.CountyFips == g.CountyFips
WHERE g.CountyFips == null;
OUTPUT @unmatched
TO "/staging/diagnostics/geocode-unmatched-sample.csv"
USING Outputters.Csv(outputHeader: true);Run this diagnostic version first. Check the match count. If it's zero or unexpectedly low, look at the unmatched sample to see what the actual FIPS codes look like. Fix the normalization issue (CountyFips.PadLeft(5, '0') in this case), then run the full job.
The Lesson: Validate Before Scale
The expensive mistake was running the full six-hour job with bad join logic. The cheap version — the diagnostic script above — would have taken 20 minutes and told me the join was producing nothing before I committed to the full execution.
For any large U-SQL job with a JOIN, run the diagnostic pass first. Write the row counts and a sample of unmatched records before the main output. It adds 20 minutes to the development cycle and saves hours of waiting on broken jobs. That tradeoff is easy math. As always, I'm here to help.