A pipeline failed at 3 AM. The client calls at 9 AM asking why their morning report is empty. I open the ADF monitor. I see a red slice. The error message says "SqlException: Timeout expired." That's it. No row counts, no which activity failed first, no timeline showing when each step started and stopped, no stack trace that tells me whether this is a query timeout or a connection timeout or a lock timeout. Welcome to ADF monitoring in 2015.
What ADF Monitor Actually Shows
The monitoring view in the portal shows slice state per dataset. Click on a failed slice and you get the activity run details: activity name, status, start time, end time, duration, and an error message. For Copy Activity that completed successfully, you also get rows read and rows written after the fact. That's the full picture ADF gives you natively.
Compare this to SSIS's SSISDB execution reports: you can query catalog.executions for every package execution, catalog.executable_statistics for duration per task and per data flow component, catalog.event_messages for warnings and errors with full event data, and catalog.data_statistics for row counts at each data flow path. SSISDB is a rich execution audit trail. ADF's monitoring is a status board.
Debugging a Failed Activity
For failures on cloud-to-cloud activities (Blob to Azure SQL, for example), the error message in the portal is usually sufficient to diagnose the problem. SqlException timeout messages, schema mismatch errors, truncation errors — these surface in the portal error message.
For failures on activities that go through the Data Management Gateway, you need the gateway logs. Open Event Viewer on the gateway machine. Look in Application logs for entries from the Data Management Gateway service. The detailed error — the full exception with stack trace, the specific Oracle ORA- error code, the network timeout with the actual endpoint — lives here, not in the portal. This is not discoverable without knowing to look there.
My standard debugging checklist for a failed ADF activity:
- Open the failed slice in the portal, read the error message
- If the error is a gateway connectivity issue (error code 2200), RDP to the gateway machine and check the DMG service status and Event Log
- If the error is a SQL error, connect to the target database and check for blocking queries, long-running transactions, or failed transactions in the error log
- If the error is ambiguous, enable verbose logging on the gateway machine (DMG configuration manager) and re-run the slice
Building Supplemental Monitoring
The gap between what ADF provides and what production operations require drove me to build a supplemental monitoring layer. Here's what I built and how:
Execution Log Table
CREATE TABLE dbo.ADF_ExecutionLog (
LogID INT IDENTITY(1,1) PRIMARY KEY,
PipelineName NVARCHAR(200) NOT NULL,
SliceStart DATETIME NOT NULL,
SliceEnd DATETIME NOT NULL,
ActivityName NVARCHAR(200) NOT NULL,
RowsProcessed INT NULL,
ExecutionStartUTC DATETIME NOT NULL,
ExecutionEndUTC DATETIME NULL,
StatusCode NVARCHAR(20) NOT NULL DEFAULT 'Running',
ErrorMessage NVARCHAR(MAX) NULL,
Notes NVARCHAR(MAX) NULL
);
CREATE PROCEDURE dbo.usp_LogADFExecution
@PipelineName NVARCHAR(200),
@SliceStart DATETIME,
@SliceEnd DATETIME,
@ActivityName NVARCHAR(200),
@RowsProcessed INT = NULL,
@StatusCode NVARCHAR(20) = 'Success',
@ErrorMessage NVARCHAR(MAX) = NULL
AS
INSERT INTO dbo.ADF_ExecutionLog
(PipelineName, SliceStart, SliceEnd, ActivityName, RowsProcessed, ExecutionStartUTC, ExecutionEndUTC, StatusCode, ErrorMessage)
VALUES
(@PipelineName, @SliceStart, @SliceEnd, @ActivityName, @RowsProcessed, GETUTCDATE(), GETUTCDATE(), @StatusCode, @ErrorMessage);
Each pipeline's final stored procedure calls usp_LogADFExecution with the execution metadata. For row counts, the stored procedure uses @@ROWCOUNT after the MERGE or INSERT. Slice start and end are passed from ADF via stored procedure parameters using $$Text.Format expressions in the activity definition.
Azure Alert Rules
In the Azure portal, navigate to your data factory and configure Alert Rules on the Failed Runs metric. Set a threshold of 0 (alert if failed runs > 0), action group pointing to an email distribution list or a webhook. This gives you near-real-time notification of slice failures without polling the portal manually.
The alert fires at the metric aggregation interval (typically 5 minutes). Not instant, but close enough for most production scenarios. For pipelines with strict SLAs, combine this with the execution log table for historical trending.
Power BI Dashboard
A Power BI report over the execution log table gives you pipeline health at a glance: success rate by pipeline by day, average row counts trending over time (useful for detecting upstream data volume anomalies), failure frequency heat map by hour of day (tells you when your pipelines are most fragile). This takes a day to build and makes the operations conversation with clients dramatically easier.
What Microsoft Needs to Improve
In-progress monitoring. SSISDB shows you a running package's current task. ADF shows you nothing for a running slice — you know it started and you can calculate that it should have finished by now, but you have no visibility into where it is within the pipeline. For long-running pipelines (2+ hour loads), this is a significant gap.
Native row counts in the monitoring view, not just post-completion. One-click slice rerun from the monitoring interface without navigating to the dataset view. Longer retention of execution history (currently limited). These are table stakes for production ETL monitoring.
Until these ship, the supplemental layer is what we have. It works, it scales, and it gives you more useful information than the portal does. Build it early — retrofitting monitoring onto pipelines that are already in production is harder than building it in from the start. If you need help setting up the execution log pattern, I'm here to help.