Monitoring SQL Azure: What the DMVs Don't Tell You Anymore
The first thing a DBA does when something feels wrong is open SSMS and start querying DMVs. sys.dm_exec_requests, sys.dm_os_wait_stats, sys.dm_exec_sessions — the standard toolkit. In 2011, when clients started moving databases to SQL Azure, I kept hearing the same complaint: "I can't see what's happening." They were right, but the problem was more specific than they realized. The tools still worked. The surface area had just changed.
What Survived the Move to SQL Azure
Most of the query-level DMVs carried over. You could still query sys.dm_exec_requests to see active queries, sys.dm_exec_query_stats for execution statistics, and sys.dm_exec_sql_text to get the actual query text. The basics of "what is running and how expensive is it" were intact.
-- Still works in SQL Azure 2011 — find expensive queries
SELECT TOP 20
qs.total_elapsed_time / qs.execution_count AS avg_elapsed_us,
qs.execution_count,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1) AS query_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY avg_elapsed_us DESC;This still runs. This still gives you useful information. Start here when you're diagnosing slow queries on SQL Azure, same as on-prem.
What Changed or Disappeared
The OS-level DMVs were gone. sys.dm_os_ring_buffers, sys.dm_os_memory_clerks, sys.dm_os_sys_info — anything that peered into the operating system or the SQL Server process itself was either unavailable or returned limited data. You don't own the OS; Microsoft does. The information that belongs to the OS layer isn't yours to query.
sys.dm_os_wait_stats returned data, but it reflected cumulative waits since the last replica failover or service restart — which on a managed service could happen at any time without your knowledge. The baseline you'd use for comparison on-prem (reset at the last restart, which you controlled) didn't translate cleanly to a service that could restart underneath you.
SQL Server Agent, and therefore anything you'd normally monitor through Agent job history, was simply not present. If you were used to running health-check queries as scheduled jobs, you had to move those to an external scheduler or Azure Scheduled tasks (which were primitive in 2011).
What Azure Added Instead
The Azure Management Portal gave you database-level metrics: CPU percentage, data I/O, log I/O, and connections. These were coarse — rolling averages over 5-minute windows — but they gave you the high-level signal. If CPU was pegged at 100% for 30 minutes straight, you knew you had a problem even without a fine-grained execution plan.
The more useful addition was the Azure SQL Database auditing and event log, which captured login failures, permission errors, and destructive operations. Not performance monitoring, but useful for security and compliance work that on-prem clients often had no visibility into at all.
The Practical Monitoring Stack in 2011
What I ended up running for SQL Azure clients was a lightweight monitoring job that hit the database from an external machine — an Azure Worker Role or a VM — on a schedule and logged the DMV output it could access to an Azure Table Storage account. This gave me a time-series record of query performance, blocking, and connection counts that I could query and alert on.
The queries were simpler than what I'd run on-prem, but the discipline of sampling and storing was more important in a managed environment where I couldn't just walk up to a server and run a trace. You had to build the observability layer you needed rather than assume it was there.
The Broader Shift
What SQL Azure forced in 2011 was a shift from reactive to proactive monitoring. On-prem, you could always get the information if you needed it — run a profiler trace, query a DMV, check perfmon. In a managed environment, some of that information isn't available, and the information that is available requires you to have been collecting it before the problem started. That shift in mindset — instrument first, investigate second — turns out to be the right pattern for cloud infrastructure in general. SQL Azure just made it non-optional earlier than most people expected.
If you're running SQL Azure workloads and finding the monitoring story frustrating, let me know what gap you're hitting. The ecosystem improved significantly through 2012 and 2013, and there were workarounds for most of the rough edges. As always, I'm here to help.