Snowflake interview questions and answers ๐Ÿ‘‡

  1. Snowflake Interview Questions

Snowflake Interview Questions


Which cloud platforms are supported by Snowflake?


Snowflake is available on AWS, Azure, and GCP in countries across North America, Europe, Asia Pacific, and Japan.


How is data stored in Snowflake?


Snowflake optimizes and stores data in a columnar format within the storage layer, organized into databases as specified by the user dynamically as resource needs change. When virtual warehouses execute queries, they transparently and automatically cache data from the database storage layer.


What is Virtual warehouse?


In Snowflake, a virtual warehouse is a cluster of database servers deployed on-demand to execute user queries. On a traditional on-premise database, this would be an MPP server (Massively Parallel Processing), which is a fixed hardware deployment. However, on Snowflake, a Virtual Warehouse is a dynamic cluster of virtual database servers that consist of CPU cores, memory, and SSD, which is maintained in a hardware pool and deployed within milliseconds. To the end-user, this process is entirely transparent.


What is Columnar database?


Columnar databases organize data at Column level instead of the conventional row level. All Column level operations are much faster and consume less resources when compared to a row level relational database.


What is the use of the Compute layer in Snowflake?


What are Micro Partitions?


In Snowflake, all data in tables is automatically divided into micro-partitions, which are contiguous units of storage. Snowflake is columnar-based and horizontally partitioned, meaning a row of data is stored in the same micro-partition.

To allow more control over clustering, Snowflake supports explicitly choosing the columns on which a table is clustered. These columns are called clustering keys and they enable Snowflake to maintain clustering according to the selected columns, as well as enable you to recluster on command. Reclustering a table impacts its physical organization by rearranging the data in a subset of micro-partitions for the table.


What is Snowpipe?


Snowpipe is Snowflake's continuous data ingestion service. Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion. With Snowpipe's serverless compute model, Snowflake manages load capacity, ensuring optimal compute resources to meet demand.


Explain the Snowflake Schema.


The Snowflake Schema describes how data is organized in Snowflake. Schemas are basically a logical grouping of database objects (such as tables, views, etc.). Snowflake schemas consist of one fact table linked to many dimension tables, which link to other dimension tables via many-to-one relationships. A fact table (stores quantitative data for analysis) is surrounded by its associated dimensions, which are related to other dimensions, forming a snowflake pattern. Measurements and facts of a business process are contained in a Fact Table, which is a key to a Dimension Table, while attributes of measurements are stored in a Dimension Table. Snowflake offers a complete set of DDL (Data Definition Language) commands for creating and maintaining databases and schemas.


What is Snowflake Time Travel?


Snowflake Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period.


What is zero-copy Cloning?


Zero copy cloning is a snowflake implementation where a simple keyword CLONE lets us create a copy of your tables, schemas, databases without copying the actual data. So, you can have almost real time data from production cloned into your dev and stage environments to be able to perform various actions.


Explain Data Retention Period.


The standard retention period is 1 day (24 hours) and is automatically enabled for all Snowflake accounts:

For Snowflake Standard Edition, the retention period can be set to 0 (or unset back to the default: 1 day) at the account and object level (i.e. databases, schemas, and tables).

For Snowflake Enterprise Edition (and higher):

  • For transient databases, schemas, and tables, the retention period can be set to 0 (or unset back to the default: 1 day). This is also true for temporary tables.
  • For permanent databases, schemas, and tables, the retention period can be set to any value from 0 to 90 days.

What are Snowflake Connectors?


Snowflake connectors support connecting different data sources with Snowflake. So basically Snowflake connectors support connecting platforms such as Python, Kafka, and Apache Spark with Snowflake. You can load data into snowflake tables using the snowflake connectors seamlessly. For e.g. the Spark connector achieves bi-directional data transfer between snowflake tables and Spark data frames.


What are Snowflake views?


Tables and views are the primary objects created and maintained in database schemas: All data in Snowflake is stored in tables. Views can be used to display selected rows and columns in one or more tables.


Describe Snowflake Clustering.


Clustering is a type of data partitioning, where unique cluster keys are specified for each table. Cluster keys are subsets of a table's columns that are used to co-locate data within the table. These keys are appropriate for comprehensive tables. The process of managing clustered data in a table is known as re-clustering.


Describe Snowflake Compression.


Snowflake is significantly less expensive because Snowflake storage prices apply to compressed data. Snowflake enables at least a 3:1 compression ratio, reducing Snowflake's effective storage cost to $10/TB/month or less.


What is the difference between Snowflake and Redshift?


Snowflake separates compute from storage, allowing for flexible pricing and configuration. Redshift allows for cost optimization through Reserved/Spot instance pricing. Snowflake implements instantaneous auto-scaling while Redshift requires addition/removal of nodes for scaling.


Is Snowflake an ETL (Extract, Transform, and Load) tool?


Yes, Snowflake supports both transformation during (ETL) or after loading (ELT). Snowflake works with a wide range of data integration tools, including Informatica, Talend, Fivetran, Matillion and others.


Explain fail-safe in the context of Snowflake.


Fail-safe is a data recovery service that is provided on a best effort basis and is intended only for use when all other recovery options have been attempted. Fail-safe is not provided as a means for accessing historical data after the Time Travel retention period has ended.


Explain Snowflake caching.


Snowflake caches the results of each query you run, so whenever a new query is submitted, it checks if a matching query already exists, and if it does, it uses the cached results instead of running the new query again. Due to Snowflake's ability to fetch the results directly from the cache, query times are significantly optimized.

Types of Caching in Snowflake

  • Query Results Caching: It stores the results of all queries executed in the past 24 hours.
  • Local Disk Caching: It stores data used or required for performing SQL queries.
  • Remote Disk Cache: It holds results for long-term use.

Can AWS glue connect to Snowflake?


Yes, AWS Glue Customer Connectors help users to search and select connectors from the AWS Marketplace or bring their own connectors. Using this new feature, users can easily connect to Snowflake with few clicks using their own Snowflake connector and start orchestrating the data pipeline in minutes.


Explain the data security features in Snowflake.


Snowflake encrypts all customer data by default using end-to-end encryption (E2EE) at no additional cost. Snowflake provides best-in-class key management and is transparent to customers:

  • All data is automatically encrypted by Snowflake using Snowflake-managed keys
  • All communication and data transfer between clients and the server is protected through TLS
  • Customers can choose the geographical location where their data is stored, based on their cloud region