ChatGPT Changes the Rubber Duck Forever

OpenAI just released ChatGPT and I've spent the last week doing nothing but using it. Not to generate blog posts or marketing copy — to work. Architecture reviews, pipeline design sessions, debugging conversations, schema design tradeoffs. Real data engineering work, with an LLM as a participant rather than a one-shot code generator.

The rubber duck sessions I described earlier this year — single-prompt, one-shot, carefully worded to fit the context window — are obsolete. ChatGPT's multi-turn conversation model changes the dynamic fundamentally.

What Multi-Turn Actually Changes

The GPT-3 Playground sessions had a ceiling. You wrote a careful problem statement, got an analysis, extracted what was useful, and closed the loop. You couldn't refine it because the next message didn't carry any context from the previous one. Any follow-up required restating the whole problem.

ChatGPT holds the conversation. When it gives me an analysis I disagree with, I can say why and it updates its reasoning. When its suggestion has an edge case I know about from the data, I can describe it and ask how the suggestion holds up. When it makes an assumption I want to examine, I can push on it directly without rebuilding the context.

A real session from this week:

Me: I have a Spark job that joins a 500GB event table to a 2GB user dimension. The user dimension updates daily with about 50K new or changed rows. I'm thinking about broadcasting the dimension. What should I consider?

ChatGPT: [Analysis of broadcast considerations: executor memory, driver broadcast threshold, shuffle vs. broadcast tradeoffs]

Me: The cluster has 16GB executor memory. Is 2GB safe to broadcast?

ChatGPT: [More specific analysis with the 16GB constraint factored in]

Me: The dimension gets a full reload daily, not just deltas. Does that change the cache invalidation story?

ChatGPT: [Updated analysis accounting for full reload]

Me: What if the dimension grows to 8GB over the next year?

ChatGPT: [Recommendation to keep broadcast for now, set a monitoring threshold, switch to sort-merge join at 4GB]

That's a four-turn conversation that arrived at a specific, actionable recommendation with a monitoring trigger. The equivalent GPT-3 Playground session would have required me to write all four questions into a single prompt and hope the model addressed them all coherently.

The Context Document Is Still Necessary

ChatGPT remembers within a session, not across sessions. Every new conversation starts cold. The context document pattern I described last month still applies — paste it at the start of each session. What changes is that within a session, you don't have to keep restating context. The conversation builds on itself.

Where It Still Gets Things Wrong

ChatGPT is more capable than GPT-3 but has the same fundamental limitation: it doesn't signal uncertainty. It states incorrect things with the same confident tone it uses for correct things. The business logic failure modes from the GPT-3 field notes still apply — the model generates plausible-looking code for novel logic and gets it subtly wrong. The conversational format makes this more dangerous because the iterative refinement can feel like validation when it isn't.

The rule I've carried over: any code the model generates for business logic gets tested before it goes anywhere near production. The conversation doesn't substitute for tests. It supplements the design phase before the tests run.

The Shift This Creates

The rubber duck is better now. Meaningfully better. I'm arriving at design decisions faster, catching more edge cases before implementation, and spending less time in the "stare at the ceiling" phase of pipeline design. That's the honest accounting of what ChatGPT adds to this workflow — not magic, but a faster path through the thinking.

If you've been running design sessions in ChatGPT and have patterns worth sharing, I'd like to hear them. As always, I'm here to help.

Read more