Unity Catalog One Month In: The Migrations That Were Easy and the Ones That Weren't
Three weeks into Unity Catalog GA and I've completed two production migrations. Neither was as smooth as the quickstart documentation implies. Here's the honest breakdown of where things went cleanly and where they took longer than expected.
The Easy Migrations
Tables that were already external — pointing to ADLS or S3 paths — were straightforward. Create the external location in UC, re-register the table pointing at the same path, rebuild the grants. Most of this was mechanical once the external locations were set up correctly. For a workspace with 40 external tables, the actual migration work was about a day.
The external location setup was actually the most time-consuming part, not the table migration. Figuring out the right managed identity permissions on Azure (Storage Blob Data Contributor is necessary, but for some operations you also need Storage Blob Data Owner for recursive operations), validating that the external locations could read and write before creating tables against them — plan for half a day on this per storage account.
The Harder Migrations
DBFS managed tables. These are tables whose data lives in dbfs:/user/hive/warehouse/ — the default when you created a table without specifying a LOCATION. In Unity Catalog, DBFS managed storage isn't supported. You have to move the data to external storage before registering it in UC.
-- Identify managed tables before you start
SELECT
table_catalog,
table_schema,
table_name,
table_type,
location
FROM hive_metastore.information_schema.tables
WHERE location LIKE 'dbfs:/user/hive/warehouse/%'
ORDER BY table_schema, table_name;
On one migration, this query returned 23 tables that needed physical data movement before they could be registered in UC. Each one required a write to external storage, a row count validation, and then a UC table registration. Budget 15-20 minutes per table for the read-write-validate cycle, longer for large tables.
The Downstream Code Change Scope
On the first migration, I underestimated how many notebooks referenced tables by two-tier name. A grep across all notebooks before starting would have given me a realistic count. By the time I found all the references, I'd already had two post-migration failures in pipelines I didn't know were still using the old two-tier references.
-- Before you migrate: audit your namespace references
-- Run this in a notebook, it searches workspace files
dbutils.fs.ls("/Repos") # Check all repo-linked notebooks
-- Also search job definitions for notebook paths that might embed SQL with table references
What I'd Do Differently
Start the namespace audit before you touch anything. Every table reference in every notebook, every job definition, every dbt model, every ADF dataset — get the full list before the first migration command. Run the migration in a dev environment first with the actual production table names, find all the broken references, fix them, then do production.
The migration itself is not the risky part. The risky part is discovering on Monday morning that the overnight job you forgot about is still using two-tier table references and has been failing silently for six hours. As always, I'm here to help.