The Snowflake data platform was a top data IPO doubling in value on its first trading day in 2014. It is currently valued at a whopping $49 billion and went from 5 customers in 2014 to 6,497 as of August 2022. What is it about this data warehouse, named for its three founder’s love of skiing, that has become so widely appealing?
Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure, behemoths in the business, have their own cloud data warehouse but the Snowflake data platform is a neutral vendor. Snowflake is a relational database management system (RDBMS) which uses an SQL database engine designed for the cloud instead of sitting on top of an existing database. Snowflake’s more flexible and efficient architecture is what differentiates it from these other platforms.
The three layers of the Snowflake data platform’s architecture:
- Storage – All data is centrally stored and engineered to scale completely independent of compute resources. Snowflake manages all aspects of how the data is stored – how it is organized, the file size, the structure, compression, metadata, statistics and any other aspects of data storage. The data is not directly visible nor accessible except through SQL query operations run using Snowflake. This platform can process data without impacting queries and other workloads allowing far greater flexibility.
- Compute – Workload concurrency is never a problem because each Snowflake virtual warehouse is an independent cluster which doesn’t compete for computing resources. It is designed to process enormous quantities of data with maximum speed and efficiency. Queries are cached locally so multiple virtual warehouses can simultaneously operate on the same data while fully enforcing global system-wide transactional integrity.
- Services – Towards Data Science says, “If the compute layer is the brawn of Snowflake then the services layer is the brain that controls the compute layer.” The services layer will authenticate user sessions, provide management, enforce security functions, perform query compilation and optimization and coordinate all transactions.
Snowflake has a flexible pricing model where you pay for the compute and cloud storage that you actually use with zero long term commitments. You can do on-demand per-second pricing or pre-purchased Snowflake capacity options. The flexibility is a big plus for small but growing companies.
The negatives of the Snowflake data platform? If your workflow is already integrated heavily with Amazon, Google or Microsoft it probably makes sense to stick with their cloud offering. Stitch Data says that “Snowflake is a better platform to start and grow with.” Remember, though, Snowflake is not an analytics, dashboarding or AI tool.
Interested in discussing your career in data science? Contact Smith Hanley Associates’ Data Science and Analytics Executive Recruiter, Paul Chatlos at pchatlos@smithhanley.com.