Microsoft Abandons Azure Data Lake Analytics: Why I'm Now Looking at Databricks

Microsoft announced in September 2018 that Azure Data Lake Analytics is entering a maintenance-only phase, with no major feature work planned. No new regions. No investment in U-SQL tooling. Just: it works, we're not growing it. If you've been following along, you know I've spent real time learning U-SQL and building pipelines on ADLA. So this announcement hit differently than a routine deprecation notice.

To understand why I'm annoyed — and also why I'm not that surprised — you need to understand what ADLA was supposed to be.

What ADLA Was Trying to Do

Azure Data Lake Analytics launched in 2015 as Microsoft's cloud-native MPP analytics engine. The pitch: write U-SQL (a hybrid of SQL's SELECT/FROM/WHERE and C# lambda expressions for complex transformations), point it at Azure Data Lake Store, pay only for the AU-seconds your jobs consume. No cluster to manage. True serverless scale-out analytics.

For anyone who'd worked with SQL Server PDW — the Parallel Data Warehouse appliance — this was familiar architecture. PDW used a control node to parse and distribute queries, and compute nodes to execute in parallel. ADLA was doing the same thing, just cloud-native and job-scoped instead of requiring a dedicated $400k appliance in a data center you had to maintain yourself.

I liked the vision. What I didn't love was U-SQL. It's a genuinely weird language: SQL syntax for the SELECT clause, C# for anything dynamic, a compilation model that required Visual Studio and didn't play well with anything that wasn't Microsoft-native. The ecosystem never grew up around it the way Python and Scala grew up around Spark.

Why the Market Went Elsewhere

While Microsoft was building ADLA, Databricks — founded in 2013 by the original creators of Apache Spark at UC Berkeley — was doing the same thing with a different approach. Apache Spark is open source. The tooling ecosystem around it includes Python, Scala, R, Java, and SQL. The community is enormous. Notebooks, libraries, MLflow for experiment tracking — none of it required a Microsoft account or Visual Studio.

By 2017 it was clear which horse the market was betting on. Databricks closed a $140M Series C in 2017. AWS launched EMR (Elastic MapReduce) with first-class Spark support years earlier. Azure Databricks launched in 2018 as a joint product between Microsoft and Databricks. Which means Microsoft itself was investing in the competing platform while ADLA sat in maintenance mode.

The writing was on the wall before the formal announcement.

The Databricks Pivot

I've been doing my due diligence on Databricks for the last few months. The core architecture is the same distributed pattern: a driver program coordinates work, executor JVMs on worker nodes execute the actual computation in parallel. Multiple nodes, data partitioned across them, results aggregated. Scale out by adding nodes. Pay by the minute instead of buying hardware.

This is what I always wanted PDW to be. Not a $400k appliance. Cloud-native, elastic, billed per compute-minute, with an ecosystem that isn't locked to one vendor's tooling.

The language shift is real — you're writing PySpark (Python) or Scala or Spark SQL instead of U-SQL. That's a bigger jump than switching T-SQL dialects. But the payoff is access to the entire Python data ecosystem: pandas for small data manipulation, scikit-learn for modeling, MLflow for experiment tracking, Great Expectations for data quality — all on the same platform where your distributed transformations run.

What's Next

I'm starting a deeper series on Databricks and PySpark. The target audience is SQL Server developers who have spent their careers in the relational world — because that's where I came from, and the translation isn't obvious. Some concepts carry over directly (GROUP BY, window functions, JOINs). Others require a fundamental rethink of how computation works (lazy evaluation, the driver/executor split, partitions as the unit of parallelism).

If you've been watching Microsoft's cloud analytics stack evolve and wondering whether to bet on Azure Synapse, ADLA, or something else entirely — I'm going to make the case that the Databricks ecosystem is where the serious money is going. Come along for the evaluation.

Read more