ADF Git Integration One Year Later: The Good, The Gotcha, and What's Still Missing

It's been a year since ADF git integration shipped. My team has been living with it — daily authoring, PRs, deployments — long enough that the novelty has worn off and the real picture is clear. Here's the honest 12-month assessment.

What's Genuinely Good

No More Portal Drift

This was the thing I wanted most, and it delivered. The collaboration branch is the source of truth. When I pull up the ADF instance in the portal, I know I'm looking at exactly what's in git. The scenario that used to happen regularly — someone makes an emergency fix in the portal, doesn't tell anyone, and now your deployed factory and your repository are different — is gone.

The first time a new team member ran "what's actually in prod?" and I could say "pull the main branch, that's exactly what's running" — that felt like progress.

PR Workflow Works in Practice

The pipeline change review workflow — feature branch, save changes in ADF (which commits to branch), submit PR to main, review the JSON diffs, merge — works. The JSON diffs in Azure Repos or GitHub are readable. You can tell what changed: a new activity was added, a parameter name changed, a linked service was updated.

Junior engineers submitting pipeline changes go through the same code review process as application code changes. The team reviews. The pattern is familiar. That standardization matters operationally.

History Is Searchable

When production fails at 2 AM and you need to know what changed in the last week, git log gives you the answer. Commits are timestamped, attributed, and linked to PR descriptions. Pre-git-integration, this investigation involved cross-referencing the ADF Monitor run history with Slack messages about who changed what. Now it's git log --since="1 week ago" -- pipeline/.

The Gotcha (Still)

The adf_publish Model

A year in, the adf_publish branch is still the thing that confuses every new person on the team. The mental model requires three distinct concepts:

  1. Save — commits human-readable pipeline JSON to the collaboration branch or feature branch. This is your source of truth.
  2. Publish — generates ARM templates from the collaboration branch and writes them to the adf_publish branch. This is the deployment artifact.
  3. Deploy — your CI/CD pipeline (Azure DevOps) detects changes to adf_publish and ARM-deploys to test, then prod.

The confusion pattern I see repeatedly: someone merges a PR to main, sees that the ADF UI reflects their changes in the portal, and assumes production is now updated. It's not. They merged to main (step 1). Nobody clicked Publish (step 2). The adf_publish branch hasn't changed. The ADO pipeline hasn't run. Production is still running the previous week's state.

The "Publish" button in the ADF UI is not intuitively named. It sounds like "publish to production," but it actually means "generate deployment artifacts to the adf_publish branch." Production deployment is a separate action that your CI/CD pipeline handles.

We've added this to our team onboarding checklist. It's the single most important thing to understand about ADF git integration.

The adf_publish ARM Templates Are Generated Artifacts

The files in the adf_publish branch look like this:

{
  "$schema": "...",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "factoryName": { "type": "string" },
    "AzureSqlLinkedService_connectionString": { "type": "secureString" },
    "AzureBlobLinkedService_sasUri": { "type": "secureString" }
    /* ... 40 more parameters ... */
  },
  "resources": [
    /* ... 200+ resource blocks for every pipeline, dataset, linked service ... */
  ]
}

This file is auto-generated by ADF every time you click Publish. It's hundreds to thousands of lines for a real factory. Do not edit it manually. Do not try to review it in PRs. The human review happens in the main branch PR, where the pipeline JSON is readable.

Some engineers make the mistake of trying to understand the deployment by reading the adf_publish ARM templates. Don't. Read the main branch pipeline JSON for understanding, use the adf_publish branch only as a CI/CD trigger.

What's Still Missing

No Automated Publish on PR Merge

After a year, this is still a manual step. Someone has to click Publish in the ADF UI after every PR merge that's ready for deployment. If the team is disciplined, this works. If the team forgets, production lags behind main until someone notices.

My workaround: a Python script in the ADO pipeline that calls the ADF REST API to trigger a publish before the ARM deployment stage. When the ADO pipeline detects a change to the main branch (via a branch policy trigger), it calls the publish endpoint, waits for completion, then deploys from the fresh adf_publish artifacts.

import requests
import time

def trigger_adf_publish(subscription_id, resource_group, factory_name, token):
    url = (
        f"https://management.azure.com/subscriptions/{subscription_id}"
        f"/resourceGroups/{resource_group}"
        f"/providers/Microsoft.DataFactory/factories/{factory_name}"
        f"/getDataPlaneAccess?api-version=2018-06-01"
    )
    # Trigger publish via ADF authoring API
    publish_url = (
        f"https://management.azure.com/subscriptions/{subscription_id}"
        f"/resourceGroups/{resource_group}"
        f"/providers/Microsoft.DataFactory/factories/{factory_name}"
        f"/publishToGitHub?api-version=2018-06-01"
    )
    headers = {"Authorization": f"Bearer {token}"}
    response = requests.post(publish_url, headers=headers)
    return response.status_code

Not elegant. The publish should be automated on collaboration branch change, without a workaround script.

Large Factory ARM Template Verbosity

One client's factory now has 65 pipelines, 120 datasets, and 30 linked services. The generated ARM template is 6.8MB. Azure Resource Manager has a 4MB deployment limit. We're deploying via ARM template linking — a parent template that references child templates split by resource type — using a post-processing Python script that splits the generated ADF ARM template.

This is a solvable problem but it requires engineering effort that shouldn't be necessary. ADF should either generate linked templates natively or support a factory size that doesn't hit ARM limits.

The 12-Month Verdict

Still 7 out of 10. Same score as 2018. The direction is right and the core feature works. The adf_publish model confusion is a fixable UX problem. The missing automated publish is a fixable product gap. The ARM template size issue is a fixable tooling problem.

None of these are fundamental flaws. They're rough edges on a feature that fundamentally does what it's supposed to do. The question is whether Microsoft will smooth those edges in 2019 or let them sit.

As always, if you're setting up git integration for the first time and want to talk through the workflow, I'm here to help.

Read more