The ADF portal makes setup look deceptively simple. Create a data factory, click around, start building pipelines. What it doesn't tell you upfront: there are prerequisites, gotchas with on-premises connectivity, and a browser-based editor that will cost you work if you don't build a local JSON workflow immediately. Let's do this right the first time.
Azure Prerequisites
Before you touch the portal, make sure you have these:
- Azure subscription with at least Contributor role on the resource group where you'll create the factory. Reader won't cut it — ADF needs to create and manage resources.
- Storage account for staging and script storage. Standard LRS in the same region as your factory. ADF and your storage account in different regions means cross-region data transfer costs.
- Resource provider registration: In your subscription, navigate to Resource Providers and confirm
Microsoft.DataFactoryis registered. New subscriptions may not have it pre-registered. This is the first thing to check when the portal gives you a cryptic "resource not found" error on factory creation.
Creating the Data Factory
Portal path: New > Data + Analytics > Data Factory. Name it something environment-specific — myproject-dev-adf rather than myproject-adf. You cannot rename a data factory after creation, and you will want separate dev/prod factories. Region matters for data residency and latency — pick the region where your data sources live.
After creation, you land on the factory's Overview blade. The Author and Deploy section is where all the JSON editing happens. Resist the urge to start clicking — read the rest of this post first.
Your First Linked Service
Azure Blob Storage is the easiest starting point. In Author and Deploy, click New data store > Azure Storage. The portal generates a JSON template. Fill in your account name and key:
{
"name": "AzureStorageLinkedService",
"properties": {
"type": "AzureStorage",
"description": "Primary storage account for pipeline staging",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=mystorageacct;AccountKey=YOUR_KEY_HERE"
}
}
}
Click Deploy. The linked service is now live in your factory. No confirmation step, no staging environment — it goes straight to production. This is the first sign that the portal's authoring model is not designed for teams.
Data Management Gateway: On-Premises Connectivity
If any of your sources are on-premises (SQL Server, Oracle, file shares), you need the Data Management Gateway. This is a lightweight Windows agent that installs on a machine in your network and creates an outbound HTTPS tunnel to ADF. No inbound firewall rules required — the gateway initiates the connection.
Installation steps:
- In the portal, go to Author and Deploy > New data store > select an on-premises type (e.g., On-Premises SQL Server)
- ADF prompts you to create a gateway. Name it something meaningful:
prod-gateway-01 - Copy the registration key from the portal
- Download the gateway installer from Microsoft's download center and run it on your gateway machine
- During installation, paste the registration key when prompted
- The gateway registers with ADF and shows as Connected in the portal within a minute or two
Gateway machine requirements: Windows Server 2008 R2 or later, .NET 4.5+, 2 GB RAM minimum (4 GB recommended for production), outbound HTTPS (port 443) to *.servicebus.windows.net and *.core.windows.net. If your network has a proxy, configure it in the gateway manager before registering.
The Browser Editor Problem
Here is the honest assessment of ADF's browser-based JSON editor: it is a trap for anyone planning to run ADF in production beyond a proof of concept.
There is no autosave. Close the tab while editing and your changes are gone. There is no version history — deploy a change and the previous version is overwritten silently. There is no diff — you cannot see what changed between two deploys. There is no environment promotion — changes made in the portal go directly to the factory, dev or prod.
My recommendation from day one: treat the portal as a read-only monitoring interface. Do all JSON authoring locally.
The workflow that actually works:
- Create a git repository (local or hosted) for your factory's JSON files
- Write and edit JSON in VS Code or any text editor with JSON schema support
- Deploy via PowerShell using the AzureRM.DataFactory module (or Azure CLI)
- Use the portal only for monitoring pipeline runs and diagnosing failures
# Deploy a linked service via PowerShell
$ResourceGroup = "myproject-dev-rg"
$DataFactoryName = "myproject-dev-adf"
New-AzureRmDataFactoryLinkedService `
-ResourceGroupName $ResourceGroup `
-DataFactoryName $DataFactoryName `
-File ".\linkedservices\AzureStorageLinkedService.json" `
-Force
The -Force flag overwrites an existing linked service with the same name. Without it, a deploy to an existing name fails. Use -Force deliberately and understand it replaces without confirmation.
Environment Separation
Create separate data factories for dev and prod. The JSON files are identical except for the linked service definitions (dev storage account vs. prod storage account, dev database vs. prod database). Keep linked service JSON files in environment-specific folders and parameterize the account names and keys at deploy time via PowerShell variables or a config file that is not committed to git.
This pattern — JSON in git, secrets injected at deploy time — is the foundation you need before ADF gets its own secrets management story. Right now there is no Key Vault integration. Connection strings go in the JSON. Keep them out of source control.
Next post: a deep dive on linked services and the connector landscape. If you hit a wall during setup, I'm here to help.