Integrating Copilot Into a Data Engineering Workflow

A month into working with GitHub Copilot, the novelty wore off. What replaced it was more useful: a clear-eyed picture of exactly where this tool belongs in a data engineering workflow and where it gets in the way.

The first few weeks were about calibration — learning when to trust the suggestion and when to delete it immediately. The second month was about integrating it deliberately rather than just having it running in the background. That distinction matters more than it sounds.

The Settings That Actually Change Behavior

Out of the box, Copilot suggests on every keystroke and displays completions inline. For some developers that works immediately. For me, it was too aggressive — I kept accepting suggestions reflexively before I had read them. The fix was simple: dial back the auto-trigger settings so completions appear when I ask for them, not on every pause.

In VS Code, the relevant settings:

{
    "editor.inlineSuggest.enabled": true,
    "github.copilot.inlineSuggest.enable": true,
    "editor.suggest.preview": false,
    "github.copilot.editor.enableAutoCompletions": false
}

Disabling auto-completions and using the manual trigger (Alt+\ on Mac) shifted the dynamic from "Copilot runs constantly" to "I ask Copilot when I want input." That's a better mental model. You're in control of the interaction, not passively accepting or rejecting a stream of guesses.

Where Data Engineering Work Specifically Benefits

Not all programming tasks are equal from Copilot's perspective. Code that follows well-established patterns — the kind that appears thousands of times in public repositories — gets better completions than novel or domain-specific logic. In data engineering, this maps cleanly to a distinction you probably already feel: infrastructure code vs. business logic.

Infrastructure code is where Copilot shines. Setting up a SQLAlchemy connection with retry logic. Writing a standard Airflow sensor. Scaffolding a Pytest fixture that mocks a database cursor. These patterns are stable, well-documented, and Copilot has seen enough of them to complete them accurately.

Here's an example that illustrates the gap. If I type this signature:

def wait_for_upstream_table(
    hook: PostgresHook,
    table_name: str,
    timeout_minutes: int = 30,
) -> bool:

Copilot generates a reasonable polling loop with sleep intervals and a timeout check. Not perfect — it might use time.sleep where I'd prefer an async pattern — but structurally correct and worth adapting rather than writing from scratch.

Business logic is where Copilot earns less trust. If I'm writing a function that encodes a domain-specific rule — "this source uses fiscal quarters, not calendar quarters, and fiscal year starts in October" — Copilot has no idea. It will generate generic date logic that is plausible-looking and wrong for this specific domain. You still write that part yourself.

Airflow DAG Scaffolding Is the Best Use Case

Of everything I tested in the first two months, Airflow DAG generation was where Copilot delivered the most consistent value. DAG structure is extremely formulaic: imports, default args, DAG definition, task definitions, dependency chain. Copilot knows this pattern deeply.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.postgres.hooks.postgres import PostgresHook

default_args = {
    "owner": "data-engineering",
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "email_on_failure": True,
    "email_on_retry": False,
}

with DAG(
    dag_id="ingest_crm_accounts",
    default_args=default_args,
    schedule_interval="0 6 * * *",
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["crm", "ingestion"],
) as dag:

Type that header and Copilot will suggest task definitions with reasonable operator choices, dependency syntax, and even sensible task IDs based on the dag_id context. The suggestions aren't always exactly right, but they're close enough that I'm editing rather than writing from scratch — and editing is faster.

The Convention File Becomes Part of the Workflow

The markdown convention file I started in the first month evolved into something I'd open deliberately at the start of each session. Not a huge document — a page, maybe two. Project naming conventions, the shape of the config objects, which hooks and operators were approved for this client's environment, which patterns were explicitly forbidden.

Keeping it open in the editor is a hack. Copilot reads what's visible in the workspace, and that file being open biases suggestions toward project conventions. It's not elegant. It's also effective, which is the only evaluation criterion I care about in production.

I started to wonder whether this file needed to be more structured — whether there was a way to make it more machine-readable without making it less useful for me as a human reference. That question would sit in the back of my head for a while before I did anything about it.

The Habit That Forms

By the end of the second month, Copilot had become part of the muscle memory. The Tab key took on a new meaning: not just "accept completion" but "is this what I meant?" The review step started happening automatically, without deliberate effort. That's the sign that a tool has genuinely integrated — when the quality check becomes instinctive rather than a separate mental step.

The productivity gain was real and measurable in how I felt at the end of a day. Less of that tired-fingers, low-level-task fatigue. More time spent on the genuinely hard parts of the work — the schema design decisions, the orchestration logic, the tradeoffs that required actual judgment. The scaffolding just appeared.

The question I was starting to sit with: if context is the key variable in how useful this tool is, what would it look like to manage that context deliberately, across sessions, rather than just keeping a markdown file open? That's the thread I started pulling on next.

If you've integrated Copilot into your own data engineering workflow and found patterns I haven't mentioned here, I'd like to hear them. As always, I'm here to help.

Read more