The New Data Workforce: Skills That Matter in 2025

DAIS 2024 had a lot of sessions on training programs, certifications, and emerging job functions. The subtext underneath most of them was the same: the skills floor for data practitioners is rising, and it's rising fast. If you're not building that floor under yourself right now, you're going to find the ground has shifted.

Why Upskilling Is Non-Negotiable Right Now

The skills that were sufficient to be a productive data engineer in 2022 are not sufficient in 2025. That's not hyperbole — the tooling has changed, the expectations have changed, and most significantly, the presence of AI in the stack has gone from optional to assumed.

The good news: the foundation skills are holding. Understanding data, writing reliable pipelines, governing access, building for scale — those are still the core. What has changed is the layer on top of them. If you're interviewing for data roles today and you don't have demonstrable AI experience, you're competing against candidates who do. The bar isn't going up; it's already there.

The Roles That Are Emerging

AI/ML Engineer — not new, but the scope has expanded. Now includes prompt engineering, evaluation pipelines, model fine-tuning, and deployment to production inference endpoints. The word "production" matters here. I'll come back to it.

Prompt Engineer — and here's the one that's doing something interesting. It's becoming a named role at a lot of organizations, specifically for practitioners who specialize in systematic prompt design, evaluation frameworks, and optimization across large workflows. But simultaneously, it's becoming a universal expectation. The organizations hiring a dedicated Prompt Engineer are the same ones expecting every engineer on the team to prompt competently. Knowing how to write a good prompt is becoming as expected as knowing how to write a SQL query. The named role exists for the practitioners doing it at depth; the skill is something you either have or you're behind.

AI Governance Architect — the intersection of data governance expertise and AI system requirements. Audit trails, lineage through inference, bias evaluation, compliance across regulatory frameworks. This role has been emerging for a while; it's more concrete now because there are actual regulatory frameworks to govern against.

A note on what you'll notice is missing: retrieval engineer was on earlier versions of this list. I'm taking it off. The skills involved — vector search, embedding pipeline construction, RAG architecture, reranking — are real and in demand. But they've been absorbed. Data engineers are expected to know them. Data scientists are expected to know them. The industry did what it always does: identified a full team's worth of skills and decided one person should just have all of them. If you're a data engineer or data scientist, those retrieval skills are now part of your job description, not a separate role you hand off to.

What Skills Will Matter Most

Production deployment — not demos. This is the thing I want to land hardest. The window for impressing employers with a proof of concept closed a while ago. What employers are asking for now is: can you get an LLM-leveraged tool into production? Can you stand up inference endpoints, build evaluation pipelines, instrument for drift, wire in observability, and handle the operational reality of a model in a live system? Demo-level RAG is table stakes. Production RAG — with evaluation, monitoring, and the ability to detect when the system is degrading — is the differentiator.

If your portfolio is full of Jupyter notebooks that show a working prototype, that's fine as a starting point. But the candidates getting offers have something running in prod. Ship something. Instrument it. Prove you can operate it after it's live.

Evaluation and evals pipelines. Knowing how to build a model is less valuable than knowing how to measure whether it's working. Automated evaluation frameworks, human-in-the-loop review, regression testing when you update a prompt or swap a model — these are the engineering disciplines that separate practitioners who can demo from practitioners who can operate.

RAG architecture and hybrid retrieval. You don't need a dedicated retrieval engineer title on your business card. You need to understand how vector search and full-text search complement each other, what a reranker does and when it earns its latency cost, and how to build a retrieval pipeline that degrades gracefully. This is data engineering with a different shape, and it belongs in your toolkit.

Prompt engineering — systematically. Not prompt craft as an art form; prompt engineering as a repeatable discipline. Version control for prompts. Evaluation sets to measure whether changes improve or regress outputs. Understanding of how context window usage affects output quality. The ability to diagnose a bad model output and identify whether the problem is in the prompt, the retrieval, the model, or the data. This applies to you whether or not "Prompt Engineer" is anywhere in your job title.

The theme across all of these is the same: employers have moved past wanting to see that you understand the concepts. They want to see that you've shipped the thing and kept it running. That's what matters in 2025. As always, I'm here to help.

Read more