The Integration Runtime is ADF v2's answer to a question v1 never asked explicitly: where does this activity actually execute? In v1, cloud activities ran on Microsoft-managed compute, on-premises activities ran on the DMG server, and nobody made you think about it much. In v2, the IR is a first-class concept that you configure and assign deliberately.
There are three IR types. Understanding when to use each one is fundamental to designing ADF v2 solutions that perform well and don't surprise you in production.
Azure Integration Runtime
The Azure IR is fully managed Microsoft compute. You don't provision it, you don't maintain it, you don't worry about it going down. It handles cloud-to-cloud activities: Copy Activity between Azure services, Data Flow execution (when Data Flows ship), Lookup Activity against Azure SQL, and similar cloud-native operations.
For most copy workloads, the default Azure IR is sufficient — Microsoft assigns it to an appropriate region automatically. For performance-sensitive workloads, you can create a custom Azure IR and specify the region (put it in the same region as your data sources to reduce latency and egress costs) and the data integration unit count (more DIUs = more parallelism = more throughput = higher cost).
{
"name": "AzureIR-EastUS",
"type": "Managed",
"typeProperties": {
"computeProperties": {
"location": "East US",
"dataFlowProperties": {
"computeType": "General",
"coreCount": 8
}
}
}
}
A practical note: if your source and sink are in different regions, the Azure IR incurs data egress charges. Pay attention to where your IR is relative to your data stores when you're optimizing costs on high-volume pipelines.
Self-Hosted Integration Runtime
The Self-Hosted IR is the DMG's successor. You install it on a machine inside your network — Windows Server, typically — and it provides ADF with connectivity to data sources that aren't reachable from the public internet. SQL Server on-premises, Oracle, IBM DB2, file shares, network-accessible REST APIs behind your firewall.
The key improvements over the v1 DMG:
High availability: you can install the Self-Hosted IR on multiple nodes and register them all with ADF. ADF distributes activity execution across healthy nodes. If a node goes down, activities continue on the remaining nodes. In v1, a DMG node failure meant a maintenance window. In v2, you configure redundancy.
# Install on first node - get the key from ADF portal
.IntegrationRuntime.exe /install /key "IR@your-key-here"
# Install on second node - same key, same IR
.IntegrationRuntime.exe /install /key "IR@your-key-here"
Better diagnostics: the Self-Hosted IR configuration manager shows connectivity test results, proxy configuration, and node status with actual useful error messages. The DMG diagnostics were frustrating to use. The IR diagnostics are noticeably better.
Cleaner upgrade path: Self-Hosted IR nodes can be configured to auto-update. You can also update them manually on a per-node basis, taking one node out of the pool, upgrading it, verifying it, then moving to the next. Zero downtime upgrades are achievable in a multi-node configuration.
Performance considerations: the Self-Hosted IR runs copy activities on the node where it's installed. More cores on the node means more parallel streams. For high-throughput on-premises extracts, a beefy Self-Hosted IR node (16+ cores, fast local storage for staging) makes a meaningful difference.
Azure-SSIS Integration Runtime
The Azure-SSIS IR is the most interesting new addition, and I'll write a full post about it separately. The concept: Microsoft provisions a managed cluster of Azure VMs running SQL Server Integration Services. Your existing .ispac packages deploy to this cluster's SSISDB. ADF executes them via the Execute SSIS Package Activity.
From a configuration standpoint, the Azure-SSIS IR requires:
- An Azure SQL Database or Managed Instance to host SSISDB (the SSIS catalog)
- The IR node size and count (translate to the SQL Server edition and VM specs that SSIS needs)
- Optional: custom setup scripts for installing third-party components your packages depend on
{
"name": "AzureSSISIR",
"type": "Managed",
"typeProperties": {
"computeProperties": {
"location": "East US",
"nodeSize": "Standard_D4_v3",
"numberOfNodes": 2,
"maxParallelExecutionsPerNode": 4
},
"ssisProperties": {
"catalogInfo": {
"catalogServerEndpoint": "yourserver.database.windows.net",
"catalogAdminUserName": "ssisadmin",
"catalogAdminPassword": { "type": "SecureString", "value": "..." },
"catalogPricingTier": "S1"
}
}
}
}
Provisioning takes 20-30 minutes — it's literally spinning up VMs and installing SQL Server on them. This is not a fast operation. Plan for it in your deployment pipelines.
Choosing the Right IR
The decision tree is simple:
- Cloud-to-cloud operation, both source and sink publicly accessible: Azure IR
- Source or sink is on-premises or behind a firewall: Self-Hosted IR
- You need to run an existing SSIS package: Azure-SSIS IR
One IR can serve multiple linked services. You don't need a separate IR per data source — you need one per connectivity zone. One Azure IR for all cloud operations, one Self-Hosted IR cluster per network zone that needs connectivity.
The explicit IR concept is an improvement over v1. In v1, the concept was implicit and partially hidden. In v2, you know exactly where your activity executes, which means you can reason about performance, cost, and network topology. That's the right tradeoff. If you're sizing your Self-Hosted IR nodes or planning an Azure-SSIS IR deployment, I'm here to help.