Four years. That's how long I've been asking for git integration in Azure Data Factory. Since 2014, every ADF project I've touched has had the same operational hazard: someone makes an emergency fix in the portal, doesn't document it, and now your git repo and your production factory are different. You find out during an incident, at the worst possible moment.
ADF git integration is here. Let me walk you through what it actually does, because the model has a gotcha that surprised my entire team.
How It Works
You link your ADF instance to either an Azure Repos repository or a GitHub repository. Once linked, every save you make in the ADF authoring UI creates a commit to your configured collaboration branch (typically main or master). Your pipeline definitions, datasets, and linked services live as JSON files in the repository.
The file structure looks like this:
factory/
├── pipeline/
│ ├── IngestSalesData.json
│ └── TransformCustomer.json
├── dataset/
│ ├── SalesSourceDataset.json
│ └── CustomerTargetDataset.json
├── linkedService/
│ ├── AzureSqlLinkedService.json
│ └── AzureBlobLinkedService.json
└── trigger/
└── DailyScheduleTrigger.json
These are human-readable JSON files. You can review them in a PR. You can search them with grep. You can understand what changed between commits. This is already a massive improvement over the v1 world where your pipeline definitions lived only in the Azure portal database.
The Workflow
Here's the intended development workflow:
- Developer creates a feature branch from main in the ADF UI (or via git)
- Developer makes pipeline changes in ADF, saving to the feature branch
- Developer submits a PR to merge the feature branch into main
- Team reviews the PR (the JSON diffs are readable)
- PR is merged to main
- Someone clicks Publish in the ADF UI
Step 6 is where the model gets interesting.
The adf_publish Branch: The Gotcha
When you click Publish in the ADF UI, it does not deploy your pipeline. It generates ARM templates from the current state of your collaboration branch and writes them to a separate branch called adf_publish.
The adf_publish branch contains ARM template JSON — not the human-readable pipeline JSON from your main branch. It looks like this:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": { ... },
"resources": [
{
"type": "Microsoft.DataFactory/factories/pipelines",
"apiVersion": "2018-06-01",
"name": "[concat(parameters('factoryName'), '/IngestSalesData')]",
...
}
]
}
Your CI/CD pipeline (Azure DevOps, GitHub Actions) deploys from adf_publish, not from main.
So the mental model is:
- Save → commits to feature branch / main (human-readable JSON)
- Publish → generates ARM templates to adf_publish (deployment artifacts)
- Deploy → your CI/CD pipeline reads from adf_publish and ARM-deploys to test/prod
Three distinct concepts. My team — experienced Azure engineers — spent two weeks confused about why their main branch changes weren't showing up in production. They didn't realize that merging to main wasn't enough. You had to click Publish. Nobody clicked Publish. Production was running the previous week's state.
The ADF UI Editor
The ADF authoring experience is better with git integration. The branch selector is in the UI. You can create branches, switch branches, and see what branch you're on. The save experience is cleaner — it's clearly "committing to git" not "saving to Microsoft's database."
The editor is still not VS Code. Complex expression editing is better done in the expression editor than in raw JSON, but the expression editor is modal and doesn't have autocomplete for all functions. You can edit the JSON files directly in VS Code and commit them — ADF will pick up the changes. This is my preferred workflow for complex changes.
What's Still Missing
The biggest gap: there is no automated Publish on PR merge. You click Publish manually in the ADF UI. This means the CI/CD pipeline doesn't automatically trigger on code merge — someone has to remember to click Publish first, then the ADO pipeline can proceed.
There's a workaround using the ADF REST API to trigger a publish programmatically, which I'll cover in the CI/CD post. But it requires some choreography.
The adf_publish ARM templates are verbose. A factory with 30 pipelines generates an ARM template with hundreds of resources in a single file. Reviewing diffs in the adf_publish branch is not practical — those files are generated artifacts, not human-authored. Your human review happens in the main branch PR, not in adf_publish.
Verdict
7 out of 10. The direction is right. Author in git, review in git, deploy from git artifacts — this is the correct model. The adf_publish two-step is confusing but manageable once the team understands it. The lack of automated publish-on-merge is the thing I'd fix first if I were on the ADF product team.
But no more portal drift. No more emergency fixes that never get committed. No more "what's actually running in production?" That alone is worth it.
Next: the full CI/CD workflow with Azure DevOps — pipelines, trigger management scripts, and the parameter file pattern for multi-environment deployment. As always, I'm here to help.