SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
SNOWFLAKE
By Ishan Bhawantha
20th May 2021
Introduction
• Snowflake is a true SaaS offering.
• No Hardware to select
• No Software to install, configure or manage.
• Ongoing maintenance, management, upgrades and tuning is handles by SF.
• No private cloud deployment. ( AWS + Azure + GCP )
• Not a relational database. ( No PK / FK constrains)
• Insert, Update, Delete, Views, Materialized Views, ACID Transections.
• Analytical Aggregation, Windowing and Hierarchical Queries.
• Query Language -> SnowSQL
• DDL/DML
• SQL Functions
• UDF / Stored Procedure (JS)
Integration Support
• Data Integration (Informatica, Talend)
• Self-service BI Tools (Tableau, QlikView)
• Big Data Tools (Kafka, Spark, Databricks etc.)
• JDBC/ODBC Drivers
• Native Language Connectors (Python/Go/Node Js)
• SQL Interface & Client
• Snowflake Web Interface
• Snowflake CLI +DBeaver
Snowflake Architecture
• Separation of Storage and Compute.
• High-level Architecture for UI storage layer.
• For Computing use configurable VMs.
• Data stored in S3. only cost for storage.
• DDL/DML Query cost for compute
• Pricing for only what you use.
• Storage separately. ( TB/GB)
• Query Processing Separately. (com.mins)
• Service Layer | Compute Layer | Storage Layer
• Service Layer comes with Fixed price.
• Metadata
• Security
• Optimiser
What SF makes unique ?
• Scalability (Storage and Compute)
• Few nobs to tune the database.
• No need Indexing
• No Performance Tuning
• No Partitioning
• No Physical Storage Design.
• Security, Data Governance and Protection.
• Simplification and Automation
• Balance and Scale.
Virtual
warehouses
• Instances are Az EC2 instances.
• Normally call SF Warehouse.
• Noting directly interact with them.
• Sizes
• X-Small : Single Node -> Analytical tasks
• Small – Two Nodes
• Medium – Four Nodes
• Large – Eight Nodes -> Data Loading
• X-Large –Sixteen Nodes -> High Performance Query Ex
• Concurrent Queries can ex.
• Additional Queries are queued wait until to execute
• Multi Cluster we can omit this.
Micro Partitions
• Automatically divided into
micro-partitions.
• Contiguous units of storage.
(50MB -500MB)
• Actual Size is much less than
that due to compress.
• SF determines the most
efficient algo for each column.
• Columnar scanning feature give
quick response.
How Micro Partitioning ?
• FROM S3
• High Availability and Durability of S3 used here.
• API to read parts.
• Break into Small Partitions (Micro Partitions)
• Re organize the data make it columnar. (Column values of partitions
are compressed together.)
• Compress the only the column values individually.
• Add header to metadata of the micro partition. (column offset)
• Micro partitions are stored in S3 as files.
Micro
partitioning
and Search
Optimizing
Data Loading
Based in the volume and frequency two options mainly.
1. Bulk Loading
2. Continuous Loading
Bulk Loading
- Uses <copyinto> command
-loading batch data from files from cloud storage or coping
-relies on the user provided virtual wh.
-Supports transforming during a load.
• Column Ordering
• Column Omission etc.
Continuous Loading
- Uses the Snowpipe
- Designed to load small volume.
- Loads within minutes after files are added.
- Use compute resources provide by the SF.
Other than that SF provides several connectors to load
data
i.e SF connector for Kafka
Preparing to Data Load from S3
• Check the file type for the data loading. JSON, AVRO,ORC etc.
STEP 1 : Create a stage
STEP 2: Execute COPY command over the stage
• Instead of the authentication you can use AWS ARN objects to authentication.
• LOAD_HISTORY Gives you the history of data loading.
create or replace stage my_s3_stage
url='s3://mybucket/encrypted_files/’
credentials=(aws_key_id='1a2b3c' aws_secret_key='4x5y6z’)
encryption=(master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=‘)
file_format = my_csv_format;
copy into mytable
from @my_ext_stage
pattern='.*sales.*.csv';
USE CASE : Data Ware house
USE CASE : Data Monitoring
• Asked separate data monitoring tool.
• Decoupled the database from the SF
used MySQL db.
• UI Tool for each day/month visualization
of meta data.
USE CASE : Data Ware house
COMPARE
ELASTIC SEARCH
- Cost is high
- Management is
not easy.
- Development
not easy.
AWS Neptune
- Cost is high.
- Not all in one
package.
- After didn’t see
any graph
requirements.
CASSANDRA
- Key Constrains
- Cant Execute DML

More Related Content

PPTX
Snowflake Architecture.pptx
PPTX
Snowflake Overview
PPTX
Snowflake essentials
PPTX
Introduction to snowflake
PDF
Snowflake for Data Engineering
PDF
Snowflake Data Science and AI/ML at Scale
PPTX
Master the Multi-Clustered Data Warehouse - Snowflake
PPTX
Delivering Data Democratization in the Cloud with Snowflake
Snowflake Architecture.pptx
Snowflake Overview
Snowflake essentials
Introduction to snowflake
Snowflake for Data Engineering
Snowflake Data Science and AI/ML at Scale
Master the Multi-Clustered Data Warehouse - Snowflake
Delivering Data Democratization in the Cloud with Snowflake

What's hot (20)

PPTX
Zero to Snowflake Presentation
PPTX
Databricks Fundamentals
PPTX
Free Training: How to Build a Lakehouse
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
Delta lake and the delta architecture
PPTX
Data Lakehouse Symposium | Day 4
PDF
Azure Synapse 101 Webinar Presentation
PDF
Making Apache Spark Better with Delta Lake
PPTX
Databricks Platform.pptx
PDF
Databricks Delta Lake and Its Benefits
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Intro to Delta Lake
PPTX
Building a modern data warehouse
PPTX
Azure data platform overview
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
PPTX
Azure Synapse Analytics Overview (r2)
PDF
Snowflake: The most cost-effective agile and scalable data warehouse ever!
PPTX
A 30 day plan to start ending your data struggle with Snowflake
Zero to Snowflake Presentation
Databricks Fundamentals
Free Training: How to Build a Lakehouse
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Introducing the Snowflake Computing Cloud Data Warehouse
Delta lake and the delta architecture
Data Lakehouse Symposium | Day 4
Azure Synapse 101 Webinar Presentation
Making Apache Spark Better with Delta Lake
Databricks Platform.pptx
Databricks Delta Lake and Its Benefits
Making Data Timelier and More Reliable with Lakehouse Technology
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Intro to Delta Lake
Building a modern data warehouse
Azure data platform overview
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
Azure Synapse Analytics Overview (r2)
Snowflake: The most cost-effective agile and scalable data warehouse ever!
A 30 day plan to start ending your data struggle with Snowflake
Ad

Similar to Snowflake Datawarehouse Architecturing (20)

PPTX
Biomatters and Amazon Web Services
PPTX
Migrating enterprise workloads to AWS
PDF
IBM Cloud Day January 2021 Data Lake Deep Dive
PPTX
Migrating enterprise workloads to AWS
PDF
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
PPTX
Centralized log-management-with-elastic-stack
PPTX
Building a Just-in-Time Application Stack for Analysts
PPTX
ME_Snowflake_Introduction_for new students.pptx
PPTX
AzureSQL Managed Instance (SQLKonferenz 2018)
PDF
Serverless SQL
PPTX
Ultimate SharePoint Infrastructure Best Practises Session - Isle of Man Share...
PDF
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
PPTX
Taking SharePoint to the Cloud
PPTX
Architectures, Frameworks and Infrastructure
PDF
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
PPTX
OpenStack: Toward a More Resilient Cloud
PDF
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
PPTX
Microservices, Continuous Delivery, and Elasticsearch at Capital One
PDF
Netflix web-adrian-qcon
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Biomatters and Amazon Web Services
Migrating enterprise workloads to AWS
IBM Cloud Day January 2021 Data Lake Deep Dive
Migrating enterprise workloads to AWS
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Centralized log-management-with-elastic-stack
Building a Just-in-Time Application Stack for Analysts
ME_Snowflake_Introduction_for new students.pptx
AzureSQL Managed Instance (SQLKonferenz 2018)
Serverless SQL
Ultimate SharePoint Infrastructure Best Practises Session - Isle of Man Share...
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
Taking SharePoint to the Cloud
Architectures, Frameworks and Infrastructure
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
OpenStack: Toward a More Resilient Cloud
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Netflix web-adrian-qcon
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Ad

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Sustainable Sites - Green Building Construction
PPTX
Geodesy 1.pptx...............................................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
737-MAX_SRG.pdf student reference guides
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPT
Mechanical Engineering MATERIALS Selection
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Sustainable Sites - Green Building Construction
Geodesy 1.pptx...............................................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CYBER-CRIMES AND SECURITY A guide to understanding
737-MAX_SRG.pdf student reference guides
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Foundation to blockchain - A guide to Blockchain Tech
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Mechanical Engineering MATERIALS Selection
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Categorization of Factors Affecting Classification Algorithms Selection
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Snowflake Datawarehouse Architecturing

  • 2. Introduction • Snowflake is a true SaaS offering. • No Hardware to select • No Software to install, configure or manage. • Ongoing maintenance, management, upgrades and tuning is handles by SF. • No private cloud deployment. ( AWS + Azure + GCP ) • Not a relational database. ( No PK / FK constrains) • Insert, Update, Delete, Views, Materialized Views, ACID Transections. • Analytical Aggregation, Windowing and Hierarchical Queries. • Query Language -> SnowSQL • DDL/DML • SQL Functions • UDF / Stored Procedure (JS)
  • 3. Integration Support • Data Integration (Informatica, Talend) • Self-service BI Tools (Tableau, QlikView) • Big Data Tools (Kafka, Spark, Databricks etc.) • JDBC/ODBC Drivers • Native Language Connectors (Python/Go/Node Js) • SQL Interface & Client • Snowflake Web Interface • Snowflake CLI +DBeaver
  • 4. Snowflake Architecture • Separation of Storage and Compute. • High-level Architecture for UI storage layer. • For Computing use configurable VMs. • Data stored in S3. only cost for storage. • DDL/DML Query cost for compute • Pricing for only what you use. • Storage separately. ( TB/GB) • Query Processing Separately. (com.mins) • Service Layer | Compute Layer | Storage Layer • Service Layer comes with Fixed price. • Metadata • Security • Optimiser
  • 5. What SF makes unique ? • Scalability (Storage and Compute) • Few nobs to tune the database. • No need Indexing • No Performance Tuning • No Partitioning • No Physical Storage Design. • Security, Data Governance and Protection. • Simplification and Automation • Balance and Scale.
  • 6. Virtual warehouses • Instances are Az EC2 instances. • Normally call SF Warehouse. • Noting directly interact with them. • Sizes • X-Small : Single Node -> Analytical tasks • Small – Two Nodes • Medium – Four Nodes • Large – Eight Nodes -> Data Loading • X-Large –Sixteen Nodes -> High Performance Query Ex • Concurrent Queries can ex. • Additional Queries are queued wait until to execute • Multi Cluster we can omit this.
  • 7. Micro Partitions • Automatically divided into micro-partitions. • Contiguous units of storage. (50MB -500MB) • Actual Size is much less than that due to compress. • SF determines the most efficient algo for each column. • Columnar scanning feature give quick response.
  • 8. How Micro Partitioning ? • FROM S3 • High Availability and Durability of S3 used here. • API to read parts. • Break into Small Partitions (Micro Partitions) • Re organize the data make it columnar. (Column values of partitions are compressed together.) • Compress the only the column values individually. • Add header to metadata of the micro partition. (column offset) • Micro partitions are stored in S3 as files.
  • 10. Data Loading Based in the volume and frequency two options mainly. 1. Bulk Loading 2. Continuous Loading Bulk Loading - Uses <copyinto> command -loading batch data from files from cloud storage or coping -relies on the user provided virtual wh. -Supports transforming during a load. • Column Ordering • Column Omission etc. Continuous Loading - Uses the Snowpipe - Designed to load small volume. - Loads within minutes after files are added. - Use compute resources provide by the SF. Other than that SF provides several connectors to load data i.e SF connector for Kafka
  • 11. Preparing to Data Load from S3 • Check the file type for the data loading. JSON, AVRO,ORC etc. STEP 1 : Create a stage STEP 2: Execute COPY command over the stage • Instead of the authentication you can use AWS ARN objects to authentication. • LOAD_HISTORY Gives you the history of data loading. create or replace stage my_s3_stage url='s3://mybucket/encrypted_files/’ credentials=(aws_key_id='1a2b3c' aws_secret_key='4x5y6z’) encryption=(master_key = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=‘) file_format = my_csv_format; copy into mytable from @my_ext_stage pattern='.*sales.*.csv';
  • 12. USE CASE : Data Ware house
  • 13. USE CASE : Data Monitoring • Asked separate data monitoring tool. • Decoupled the database from the SF used MySQL db. • UI Tool for each day/month visualization of meta data.
  • 14. USE CASE : Data Ware house
  • 15. COMPARE ELASTIC SEARCH - Cost is high - Management is not easy. - Development not easy. AWS Neptune - Cost is high. - Not all in one package. - After didn’t see any graph requirements. CASSANDRA - Key Constrains - Cant Execute DML