Why I'm Watching Snowflake: Separating Compute from Storage
I've been spending time with Snowflake this quarter, and I want to give it a fair technical read before it becomes the unexamined industry standard. The architecture is genuinely clever. The execution against that architecture is strong. And there are specific questions you should be asking before you commit to it that the sales conversation won't raise.
Let me start with what's actually good.
The Architecture Is Real
Snowflake's core design — storage in S3, compute in isolated virtual warehouses — delivers something Redshift can't do cleanly: run multiple large queries simultaneously without resource contention. On Redshift, if your BI team is running a heavy dashboard refresh at the same time your data engineering team is loading 200GB of new data, they're fighting for the same WLM queue. On Snowflake, you give each team a separate virtual warehouse. They don't touch each other.
That's a real operational win. For organizations where the shared Redshift cluster has become a political problem — whose query gets priority, who's allowed to run what during business hours — Snowflake removes the argument by removing the shared resource.
The auto-suspend and auto-resume feature is also legitimately useful. A virtual warehouse that's not running queries shuts down after a configurable idle period (default 10 minutes) and resumes in seconds when a query arrives. For workloads that have variable demand throughout the day, this means you pay for actual compute usage rather than continuous cluster uptime.
The Query Performance Story
Snowflake's micro-partitioning — automatically storing data in small, column-ordered chunks with metadata about the min/max values of each column — enables aggressive pruning on filtered queries without requiring you to define partition keys in advance. You don't have to predict which columns your analysts will filter on; Snowflake's metadata layer handles range-based skipping automatically.
In practice this means queries that would require explicit partitioning strategies on Hive or Redshift run well on Snowflake without schema design work. For analytics teams that want to query freely without thinking about physical data organization, that's compelling.
What I'm Watching Carefully
The credit billing model is opaque under variable workloads. I've covered this before but it bears repeating: optimizing Snowflake costs requires understanding how queries map to credit consumption, which requires running queries, watching the bill, and tuning. That's not free engineering time.
The bigger concern is the data format. Snowflake's micro-partition format is proprietary. Your data inside Snowflake is not readable by Spark, by Presto, by anything except Snowflake. Leaving requires a full export. At scale — multi-terabyte warehouses — that export is a non-trivial project. The calculation changes if you're storing data at a scale where the export would take days and cost meaningful money in credits to run.
For teams building on open formats (Parquet on S3, covered last month), Snowflake sits on top of a portable foundation. For teams that move raw data directly into Snowflake and process it there, the lock-in surface is larger than it appears at the initial purchase decision.
The SQL-Only Constraint
Snowflake is a SQL data warehouse. Everything in Snowflake happens in SQL. If your analytics engineering team lives in SQL, this is fine. If your data scientists need Python, if your feature engineering pipelines need Spark, if your ML training jobs need GPU instances — all of that happens outside Snowflake, with data extracted from Snowflake. That extraction is a recurring cost and a seam in your architecture.
Databricks, as a comparison point, runs SQL, Python, Scala, and R on the same cluster against the same data. No extraction step. The workloads coexist. That's a meaningful architectural advantage for organizations where the analytics team and the ML team are both growing.
I'm not writing Snowflake off — the architecture is strong and it will be the right choice for specific organizations. But the decision deserves precision about what you're buying and what you're giving up. I'll have more on the direct Snowflake vs. Databricks comparison soon.
If you're evaluating Snowflake for a specific use case, I'm happy to work through it. As always, I'm here to help.