ADF Wrangling Data Flows: Power Query Inside Your Pipelines

ADF now has two types of data flows, and most people only know about one of them. Mapping Data Flows — the code-free Spark transformation engine — got all the attention in 2018. Wrangling Data Flows shipped quietly and haven't gotten nearly as much coverage. They're worth understanding, because they're aimed at a different audience and reveal something interesting about where Microsoft is taking this product.

What Wrangling Data Flows Are

Wrangling Data Flows use Power Query M language — the same engine behind Power BI's Get & Transform and Excel's Power Query. You get the same mash-up editor that Power BI analysts have been using for years, but now running inside ADF as a pipeline activity.

If you've used Power Query in Power BI, you know the experience: load a source table, apply transformation steps in a point-and-click editor, each step generates an M formula behind the scenes. Rename columns, filter rows, merge queries, pivot/unpivot, split columns by delimiter — all through a visual UI that records the steps as M code.

That same experience now lives in ADF.

Who This Is For

Wrangling Data Flows are explicitly designed for non-developer data analysts. The target user is someone who knows Power BI, understands their data model, and has been building Power Query transformations in Power BI datasets or Excel for years. They know what they want the data to look like; they don't know Python or SQL.

For that user, this is a significant capability: they can build server-side transformation logic in a tool they already know, without writing code and without depending on a developer to implement transformations on their behalf.

The self-service analytics narrative has always had a disconnect: analysts use Power BI for reporting but depend on IT/engineering for the data preparation pipelines that feed the reports. Wrangling Data Flows close that gap, at least for transformation scenarios that fit within Power Query's capability set.

Wrangling vs. Mapping: When to Use Which

The two data flow types are not interchangeable. They run on different engines and have different capability profiles.

CharacteristicWrangling Data FlowsMapping Data Flows
EnginePower Query (Mashup)Apache Spark
LanguagePower Query MADF expression language
AuthoringPower Query editorADF canvas (visual)
ScaleModerate (single-machine)Large (distributed Spark)
Cold startFaster3-5 min Spark startup
Target userAnalyst / Power BI userData engineer
Complex joinsLimitedFull support
StreamingNoNo
ML integrationNoNo

For production ETL workloads with large data volumes, complex transformation logic, or strict SLAs: Mapping Data Flows. For analyst-owned transformation logic, migration of existing Power Query M from Power BI to server-side execution, or simple transformations where the analyst should own the logic: Wrangling Data Flows.

Migrating Power BI Transformations to Server-Side

This is probably the most immediately practical use case. Many Power BI reports have significant transformation logic in the Power BI dataset — M queries that clean, reshape, and merge data from multiple sources. This logic runs in the Power BI service when the dataset refreshes, consuming Power BI Premium capacity.

For transformations that should happen in the data pipeline — not in the BI layer — you can move that M logic to a Wrangling Data Flow. The same M formula runs in ADF, writing cleaned data to your data lake or data warehouse. The Power BI dataset then reads from the pre-transformed output, reducing the transformation burden on Premium capacity.

The M portability isn't always one-to-one — some Power Query functions and connectors available in Power BI aren't available in the ADF Wrangling Data Flow context — but the core transformation logic typically transfers cleanly.

The Product Convergence Signal

Wrangling Data Flows are the clearest signal yet that Microsoft sees ADF and Power BI as part of the same product family. Sharing the Power Query engine isn't an accident — it's an architectural decision to unify the data preparation story across the analytics platform.

This matters for how you think about the Microsoft analytics stack:

  • ADF — pipeline orchestration and data movement
  • Wrangling Data Flows — analyst-owned transformation (Power Query M)
  • Mapping Data Flows — engineer-owned transformation (Spark)
  • Power BI — reporting and self-service analytics

The boundaries between ADF and Power BI are getting softer. Shared engine, shared UX patterns, shared positioning in Microsoft's "modern analytics platform" narrative. I'll have more to say about where this is heading in a later post — the Synapse announcement changes the picture significantly.

Practical Limitations

Wrangling Data Flows don't scale the way Mapping Data Flows do. The Power Query engine runs on a single-machine mashup process, not distributed Spark. For datasets that require Spark's distributed execution (hundreds of millions of rows), Wrangling Data Flows aren't the right choice.

The transformation coverage is narrower than Mapping Data Flows — some operations available in Mapping Data Flows (Window functions, Rank, Surrogate Key) aren't in the Wrangling Data Flow catalog. For complex ETL logic, use Mapping Data Flows.

Wrangling Data Flows work best as a self-service on-ramp: the analyst handles the transformation logic they understand, in a tool they know, without blocking on engineering. For anything production-critical with large data volume or complex logic, hand it to Mapping Data Flows or Databricks.

This is a useful tool in the right context. It's not a replacement for engineering-owned ETL. Know the difference. As always, I'm here to help if you're evaluating which data flow type fits a specific scenario.

Read more