The Rise of Natural Language and No-Code Data Workflows

Databricks introduced natural language interfaces for data — type a question, get an answer, no SQL required. AI/BI Genie was the showstopper demo: connect it to your lakehouse, ask "what were our top-performing products last quarter," and it generates the query, runs it, and returns a visualization. The crowd loved it. I've seen this demo pattern before, and I have thoughts.

What's Actually New Here

Let's be precise about what's different from the natural language data query tools of five years ago. Earlier NL-to-SQL tools had a fixed schema mapping problem: they worked on pre-defined question templates and fell apart on anything off the happy path. Modern LLM-backed approaches can actually reason about schema, handle ambiguous questions, and generate non-trivial SQL. The quality gap between 2019 NL-to-SQL and 2024 NL-to-SQL is significant.

Genie specifically adds Unity Catalog context — it knows your column names, it knows your data types, it can use your table comments as documentation to improve query generation. If you've been investing in metadata quality (descriptions, comments, column lineage), that investment now directly improves what these tools can do. That's a real and underappreciated benefit of the catalog work that most teams have been deferring.

The Governance Headache Nobody Is Talking About

Here's my concern. Natural language interfaces lower the barrier to querying data — which is good. They also make it much easier for someone to accidentally query data they shouldn't be querying, because the access check happens after the LLM generates the query, not when the user forms the intent.

In a well-governed environment with Unity Catalog and proper row/column-level security, this is manageable: the generated query gets blocked if the user doesn't have access to the underlying tables, and they get a sensible error. In the environments I actually see in the field — partial UC adoption, inconsistent permissions, silver tables that technically anyone with workspace access can read — natural language interfaces are a permission audit waiting to happen.

If you're planning to roll out Genie or any NL data interface to non-technical users, do your permission audit first. Find out who can actually read what. Fix the obviously wrong ACLs. Then enable the NL layer. The reverse order creates problems that are hard to claw back once users have expectations about what they can access.

Where the "Citizen Developer" Narrative Is Overhyped

The pitch is that natural language tools will let business users answer their own data questions without needing a data analyst. I've heard this pitch — in various forms — since the dawn of self-service BI in the early 2010s. Tableau said it. Power BI said it. Now AI is saying it.

The reality is that the bottleneck was never query authorship. It was question formulation. Most business users can't accurately describe the question they need answered in a way that produces the correct query, because they don't know which tables encode which business concepts, what the edge cases in the data are, or what the difference between "revenue" and "gross revenue" and "net revenue" is in their specific system.

What NL interfaces actually enable is faster iteration for people who already understand the data domain. A data analyst who knows the schema can explore faster with a natural language interface than with a query editor. That's real value. The "replace the analyst" version of the story is still wrong. As always, I'm here to help.

Read more