SlideShare a Scribd company logo
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
5
Data sourcesNon-relational data
DESIGNED FOR THE
QUESTIONS YOU KNOW!
The Data Lake Approach
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Microsoft’s Big Data Journey
We needed to better leverage data and analytics to
do more experimentation
So, we built a Data Lake for Microsoft:
• A data lake for everyone to put their data
• Tools approachable by any developer
• Batch, Interactive, Streaming, ML
By the numbers
• Exabytes of data under management
• 100Ks of Physical Servers
• 100Ks of Batch Jobs, Millions of Interactive Queries
• Huge Streaming Pipelines
• 10K+ Developers running diverse workloads and scenarios
2010 2013 2017
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
Data Stored
Culture Changes Engineering
How is the system performing? What is the experience my customers are
having? How does that correlate to other actions?
Is my feature successful ?
Marketing
What can we observe from our customers to increase revenues?
Management
How do I drive my business based on the data?
Field
Where are there new opportunities? How can I connect with my
customers more deeply?
Support
How does this customer’s experience compare with others?
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure Databricks
HDInsight
Hive
• Performance at
scale
• Optimized for
analytics
• Multiple
analytics engines
• Single repository
sharing
HDFS Compatible REST API
ADL Store
Storage
• Architected and built for very high throughput at scale for Big Data workloads
• No limits to file size, account size or number of files
• Single-repository for sharing
• Cloud-scale distributed filesystem with file/folder ACLS and RBAC
• Encryption-at-rest by default with Azure Key Vault
• Authenticated access with Azure Active Directory integration
• Formal Certifications incl. ISO, SOC, PCI, HIPAA
HDFS Compatible REST API
ADL Store
Analytics
Storage
Cloudera CDH
Hortonworks HDP
Qubole QDS
• Open Source Apache® ADL client
for commercial and custom Hadoop
• Cloud IaaS and Hybrid
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
A Z U R E D ATA B R I C K S
A F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M
HDFS Compatible REST API
HDInsight
ADL Store
Hive
Analytics
Storage
• 63% lower TCO
than on-premise*
• SLA- managed,
monitored and
supported by
Microsoft
• Fully managed
Hadoop, Spark
and R
• Clusters
deployed in
minutes
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics• Serverless. Pay per job. Starts in
seconds. Scales instantly.
• Develop massively parallel
programs with simplicity
• Federated query from multiple data
sources
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Ingress
• Event Hubs
• IoT Hub
• Kafka
Analytics
• Stream Analytics
• Spark Streaming
• Storm
Sinks
• Data Lake Store
• Blob Store
• SQL Database
• SQL Data Warehouse
• Event Hub
• Power BI
• Table Storage
• Service Bus Queues
• Service Bus Topics
• Cosmos DB
• Azure Functions
• …..
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Azure Data Lake
Store
1Create small files
2Copy small
files
3Concat +
copy file
4ASA
5Event Hub
Capture
• Copy
• SDK
• Tools (Storage Explorer, Visual Studio, 3rd Party)
• Data Factory
• SQL Integration Services
• Streaming from external sources
• Generated by cloud analytics
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Scales out your custom code in .NET, Python, R over
your Data Lake
Familiar syntax to millions of SQL & .NET developers
Unifies
• Declarative nature of SQL with the imperative
power of your language of choice (e.g., C#,
Python)
• Processing of structured, semi-structured and
unstructured data
• Querying multiple Azure Data Sources
(Federated Query)
U-SQL
A framework for Big Data
Develop massively parallel programs with simplicity
A simple U-SQL script can scale
from Gigabytes to Petabytes
without learning complex big data
programming techniques.
U-SQL automatically generates a scaled
out and optimized execution plan to
handle any amount of data.
Execution nodes immediately
rapidly allocated to run the
program.
Error handling, network issues, and
runtime optimization are handled
automatically.
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO @"/Samples/Output/SearchLog_output.tsv"
USING Outputters.Tsv();
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
 Automatic "in-lining"
optimized out-of-the-
box
 Per job
parallelization
visibility into execution
 Heatmap to identify
bottlenecks
• Schema on Read
• Write to File
• Built-in and custom Extractors
and Outputters
• ADL Storage and Azure Blob
Storage
“Unstructured” Files
EXTRACT Expression
@s = EXTRACT a string, b int
FROM "filepath/file.csv"
USING Extractors.Csv(encoding: Encoding.Unicode);
• Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet
• Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io)
OUTPUT Expression
OUTPUT @s
TO "filepath/file.csv"
USING Outputters.Csv();
• Built-in Outputters: Csv, Tsv, Text, Parquet
• Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io)
Filepath URIs
• Relative URI to default ADL Storage account: "filepath/file.csv"
• Absolute URIs:
• ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv"
• WASB: "wasb://container@account/filepath/file.csv"
• Simple Patterns
• Virtual Columns
• Only on EXTRACT GA for now
• OUTPUT in Private Preview
File Sets
Simple pattern language on filename and path
@pattern string =
"/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}";
• Binds two columns date and suffix
• Wildcards the filename
• Limits on number of files and file sizes can be improved with
SET @@FeaturePreviews =
"FileSetV2Dot5:on,InputFileGrouping:on,
AsyncCompilerStoreAccess:on";
(Will become default between now and middle of year)
Virtual columns
EXTRACT name string
, suffix string // virtual column
, date DateTime // virtual column
FROM @pattern
USING Extractors.Csv();
• Refer to virtual columns in predicates to get partition elimination
• Warning gets raised if no partition elimination was found
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
@rows = SELECT
Domain,
SUM(Clicks) AS TotalClicks
FROM @ClickData
GROUP BY Domain;
Read Read
Partition Partition
Full Agg
Write
Full Agg
Write
Full Agg
Write
Read
Partition
Partial Agg Partial Agg Partial Agg
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
U-SQL Table Distributed by Domain
Read Read
Full Agg Full Agg
Write Write
Read
Full Agg
Write
FB
EXTENT 1
WH
EXTENT 2
CNN
EXTENT 3
Expensive!
ADLA Account/Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables
User
objects
Refers toContains Implemented
and named by
Procedures
Creden-
tials
MD
Name
C# Name
C# Applier
Table Types
Legend
Statistics
C# UDTs
Packages
• Naming
• Discovery
• Sharing
• Securing
U-SQL Catalog
Naming
• Default Database and Schema context: master.dbo
• Quote identifiers with []: [my table]
• Stores data in ADL Storage /catalog folder
Discovery
• Visual Studio Server Explorer
• Azure Data Lake Analytics Portal
• SDKs and Azure Powershell commands
• Catalog Views: usql.databases, usql.tables etc.
Sharing
• Within an Azure Data Lake Analytics account
• Across ADLA accounts that share same Azure Active Directory:
• Referencing Assemblies
• Calling TVFs, Procedures and referencing tables and views
• Inserting into tables
Securing
• Secured with AAD principals at catalog and Database level
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col2 ASC)
PARTITION BY (col1)
DISTRIBUTED BY HASH (driver_id)
);
• Structured Data, built-in Data types only (no UDTs)
• Clustered Index (needs to be specified): row-oriented
• Fine-grained distribution (needs to be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
• Addressable Partitions (optional)
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;
CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;
CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);
• Infer the schema from the query
• Still requires index and distribution (does not support partitioning)
Data
Partitioning
Tables
Distribution Scheme When to use?
HASH(keys) Automatic Hash for fast item lookup
DIRECT HASH(id) Exact control of hash bucket value
RANGE(keys) Keeps ranges together
ROUND ROBIN To get equal distribution (if others give skew)
Partitions,
Distributions and
Clusters
TABLE
T ( id …
, C …
, date DateTime, …
, INDEX i
CLUSTERED (id, C)
PARTITIONED BY (date)
DISTRIBUTED BY
HASH(id) INTO 4
)
PARTITION (@date1) PARTITION (@date2) PARTITION (@date3)
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 1
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 4 HASH DISTRIBUTION 3
C1
C2
C3
C1
C2
C4
C5
C4
C6
C6
C7
C8
C7
C5
C6
C9
C10
C1
C3
/catalog/…/tables/Guid(T)/
Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss
LOGICAL
PHYSICAL
Benefits of Table clustering and distribution
• Faster lookup of data provided by distribution and clustering when right
distribution/cluster is chosen
• Data distribution provides better localized scale out
• Used for filters, joins and grouping
Benefits of Table partitioning
• Provides data life cycle management (“expire” old partitions):
Partition on date/time dimension
• Partial re-computation of data at partition level
• Query predicates can provide partition elimination
Do not use when…
• No filters, joins and grouping
• No reuse of the data for future queries
If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
Benefits of
Distribution in
Tables
Benefits
• Design for most frequent/costly queries
• Manage data skew in partition/table
• Manage parallelism in querying (by number of
distributions)
• Manage minimizing data movement in joins
• Provide distribution seeks and range scans for query
predicates (distribution bucket elimination)
Distribution in tables is mandatory, chose according to
desired benefits
Benefits of
Clustered Index
in Distribution
Benefits
• Design for most frequent/costly queries
• Manage data skew in distribution bucket
• Provide locality of same data values
• Provide seeks and range scans for query predicates (index
lookup)
Clustered index in tables is mandatory, chose according to
desired benefits
Pro Tip:
Distribution keys should be prefix of Clustered Index keys:
Especially for RANGE distribution
Optimizer will make use of global ordering then:
If you make the RANGE distribution key a prefix of the index key, U-SQL
will repartition on demand to align any UNIONALLed or JOINed tables or
partitions!
Split points of table distribution partitions are chosen independently, so
any partitioned table can do UNION ALL in this manner if the data is to
be processed subsequently on the distribution key.
Benefits of
Partitioned Tables
Benefits
• Partitions are addressable
• Enables finer-grained data lifecycle management at
partition level
• Manage parallelism in querying by number of partitions
• Query predicates provide partition elimination
• Predicate has to be constant-foldable
Use partitioned tables for
• Managing large amounts of incrementally growing
structured data
• Queries with strong locality predicates
• point in time, for specific market etc
• Managing windows of data
• provide data for last x months for processing
Partitioned
tables
 Use partitioned tables
for querying parts of
large amounts of
incrementally growing
structured data
 Get partition
elimination
optimizations with the
right query predicates
Creating partition table
CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float
, INDEX idx CLUSTERED (vehicle_id ASC)
PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4);
Creating partitions
DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc);
DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc);
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2);
Loading data into partitions dynamically
DECLARE @date1 DateTime = DateTime.Parse("2014-09-14");
DECLARE @date2 DateTime = DateTime.Parse("2014-09-16");
INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE
SELECT vehicle_id, event_date, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, ignore “dirty” data
Loading data into partitions statically
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate);
INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate
SELECT vehicle_id, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, put “dirty” data into special partition
What is Table Fragmentation
• ADLS is an append-only store!
• Every INSERT statement is creating a new file (INSERT fragment)
Why is it bad?
• Every INSERT fragment contains data in its own distribution buckets, thus
query processing loses ability to get “localized” fast access
• Query generation has to read from many files now -> slow preparation
phase that may time out.
• Reading from too many files is disallowed:
Current LIMIT: 3000 table partitions and INSERT fragments per job!
What if I have to add data incrementally?
• Batch inserts into table
• Use ALTER TABLE REBUILD/ALTER TABLE REBUILD PARTITION regularly
to reduce fragmentation and keep performance.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Dips down to 1 active vertex at
these times
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
High-level
Roadmap
• Worldwide Region Availability (currently US and EU)
• Interactive Access with T-SQL query
• Scale out your custom code in the language of choice
(.Net, Java, Python, etc)
• Process the data formats of your choice (incl. Parquet,
ORC; larger string values)
• Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS
integration
• Administrative policies to control usage/cost for storage
& compute
• Secure data sharing between common AAD and public
read-only sharing, fine grained ACLing
• Intense focus on developer productivity for authoring,
debugging, and optimization
• General customer feedback
http://aka.ms/adlfeedback
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Resources http://usql.io
http://blogs.msdn.microsoft.com/azuredatalake/
http://blogs.msdn.microsoft.com/mrys/
https://channel9.msdn.com/Search?term=U-SQL#ch9Search
http://aka.ms/usql_reference
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-programmability-guide
https://docs.microsoft.com/en-us/azure/data-lake-analytics/
https://msdn.microsoft.com/en-us/magazine/mt614251
https://msdn.microsoft.com/magazine/mt790200
http://www.slideshare.net/MichaelRys
Getting Started with R in U-SQL
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-python-extensions
https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
http://stackoverflow.com/questions/tagged/u-sql
http://aka.ms/adlfeedback
Continue your education at
Microsoft Virtual Academy
online.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)

More Related Content

PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

What's hot (20)

PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPTX
Microsoft's Hadoop Story
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
PPTX
U-SQL - Azure Data Lake Analytics for Developers
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
PPTX
Introducing U-SQL (SQLPASS 2016)
PPTX
Apache Spark sql
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
Discardable In-Memory Materialized Queries With Hadoop
PDF
Introduction to Spark SQL training workshop
PDF
Spark sql
PPTX
Azure data lake sql konf 2016
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
PPTX
Spark SQL
PDF
Spark SQL with Scala Code Examples
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Killer Scenarios with Data Lake in Azure with U-SQL
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Microsoft's Hadoop Story
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
U-SQL - Azure Data Lake Analytics for Developers
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Introducing U-SQL (SQLPASS 2016)
Apache Spark sql
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Discardable In-Memory Materialized Queries With Hadoop
Introduction to Spark SQL training workshop
Spark sql
Azure data lake sql konf 2016
ADL/U-SQL Introduction (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
Spark SQL
Spark SQL with Scala Code Examples
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Ad

Similar to Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day) (20)

PDF
USQL Trivadis Azure Data Lake Event
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
Azure Lowlands: An intro to Azure Data Lake
PPTX
An intro to Azure Data Lake
PPTX
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
PPTX
Microsoft Azure Big Data Analytics
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PDF
Introduction to Azure Data Lake
PPTX
Designing big data analytics solutions on azure
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
PPTX
Tokyo azure meetup #2 big data made easy
PDF
Talavant Data Lake Analytics
PPTX
Azure Data Lake and U-SQL
PDF
USQ Landdemos Azure Data Lake
PDF
1 Introduction to Microsoft data platform analytics for release
PPTX
Move your on prem data to a lake in a Lake in Cloud
PPTX
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
PPTX
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
USQL Trivadis Azure Data Lake Event
Azure Data Lake Intro (SQLBits 2016)
Azure Lowlands: An intro to Azure Data Lake
An intro to Azure Data Lake
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Microsoft Azure Big Data Analytics
Big Data Analytics in the Cloud with Microsoft Azure
Introduction to Azure Data Lake
Designing big data analytics solutions on azure
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Tokyo azure meetup #2 big data made easy
Talavant Data Lake Analytics
Azure Data Lake and U-SQL
USQ Landdemos Azure Data Lake
1 Introduction to Microsoft data platform analytics for release
Move your on prem data to a lake in a Lake in Cloud
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Ad

More from Michael Rys (11)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
U-SQL Learning Resources (SQLBits 2016)
PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
PPTX
U-SQL Does SQL (SQLBits 2016)
PPTX
U-SQL Intro (SQLBits 2016)
PPTX
Using C# with U-SQL (SQLBits 2016)
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data Processing with .NET and Spark (SQLBits 2020)
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
U-SQL Learning Resources (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
modul_python (1).pptx for professional and student
PPTX
Managing Community Partner Relationships
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Transcultural that can help you someday.
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Lecture1 pattern recognition............
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PPT
Quality review (1)_presentation of this 21
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
[EN] Industrial Machine Downtime Prediction
modul_python (1).pptx for professional and student
Managing Community Partner Relationships
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Transcultural that can help you someday.
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Lecture1 pattern recognition............
STERILIZATION AND DISINFECTION-1.ppthhhbx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
Quality review (1)_presentation of this 21
oil_refinery_comprehensive_20250804084928 (1).pptx

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)

  • 5. 5 Data sourcesNon-relational data DESIGNED FOR THE QUESTIONS YOU KNOW!
  • 6. The Data Lake Approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Hadoop, Spark, R, Azure Data Lake Analytics (ADLA) Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  • 7. Microsoft’s Big Data Journey We needed to better leverage data and analytics to do more experimentation So, we built a Data Lake for Microsoft: • A data lake for everyone to put their data • Tools approachable by any developer • Batch, Interactive, Streaming, ML By the numbers • Exabytes of data under management • 100Ks of Physical Servers • 100Ks of Batch Jobs, Millions of Interactive Queries • Huge Streaming Pipelines • 10K+ Developers running diverse workloads and scenarios 2010 2013 2017 Windows SMSG Live Bing CRM/Dynamics Xbox Live Office365 Malware Protection Microsoft Stores Commerce Risk Skype LCA Exchange Yammer Data Stored
  • 8. Culture Changes Engineering How is the system performing? What is the experience my customers are having? How does that correlate to other actions? Is my feature successful ? Marketing What can we observe from our customers to increase revenues? Management How do I drive my business based on the data? Field Where are there new opportunities? How can I connect with my customers more deeply? Support How does this customer’s experience compare with others?
  • 9. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics Open Source Apache Hadoop ADL Client Azure Databricks HDInsight Hive • Performance at scale • Optimized for analytics • Multiple analytics engines • Single repository sharing
  • 10. HDFS Compatible REST API ADL Store Storage • Architected and built for very high throughput at scale for Big Data workloads • No limits to file size, account size or number of files • Single-repository for sharing • Cloud-scale distributed filesystem with file/folder ACLS and RBAC • Encryption-at-rest by default with Azure Key Vault • Authenticated access with Azure Active Directory integration • Formal Certifications incl. ISO, SOC, PCI, HIPAA
  • 11. HDFS Compatible REST API ADL Store Analytics Storage Cloudera CDH Hortonworks HDP Qubole QDS • Open Source Apache® ADL client for commercial and custom Hadoop • Cloud IaaS and Hybrid
  • 12. Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs) A Z U R E D ATA B R I C K S A F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M
  • 13. HDFS Compatible REST API HDInsight ADL Store Hive Analytics Storage • 63% lower TCO than on-premise* • SLA- managed, monitored and supported by Microsoft • Fully managed Hadoop, Spark and R • Clusters deployed in minutes *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 14. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics• Serverless. Pay per job. Starts in seconds. Scales instantly. • Develop massively parallel programs with simplicity • Federated query from multiple data sources
  • 18. Ingress • Event Hubs • IoT Hub • Kafka Analytics • Stream Analytics • Spark Streaming • Storm Sinks • Data Lake Store • Blob Store • SQL Database • SQL Data Warehouse • Event Hub • Power BI • Table Storage • Service Bus Queues • Service Bus Topics • Cosmos DB • Azure Functions • …..
  • 22. Azure Data Lake Store 1Create small files 2Copy small files 3Concat + copy file 4ASA 5Event Hub Capture
  • 23. • Copy • SDK • Tools (Storage Explorer, Visual Studio, 3rd Party) • Data Factory • SQL Integration Services • Streaming from external sources • Generated by cloud analytics
  • 26. Scales out your custom code in .NET, Python, R over your Data Lake Familiar syntax to millions of SQL & .NET developers Unifies • Declarative nature of SQL with the imperative power of your language of choice (e.g., C#, Python) • Processing of structured, semi-structured and unstructured data • Querying multiple Azure Data Sources (Federated Query) U-SQL A framework for Big Data
  • 27. Develop massively parallel programs with simplicity A simple U-SQL script can scale from Gigabytes to Petabytes without learning complex big data programming techniques. U-SQL automatically generates a scaled out and optimized execution plan to handle any amount of data. Execution nodes immediately rapidly allocated to run the program. Error handling, network issues, and runtime optimization are handled automatically. @searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int, Urls string, ClickedUrls string FROM @"/Samples/Data/SearchLog.tsv" USING Extractors.Tsv(); OUTPUT @searchlog TO @"/Samples/Output/SearchLog_output.tsv" USING Outputters.Tsv();
  • 29.  Automatic "in-lining" optimized out-of-the- box  Per job parallelization visibility into execution  Heatmap to identify bottlenecks
  • 30. • Schema on Read • Write to File • Built-in and custom Extractors and Outputters • ADL Storage and Azure Blob Storage “Unstructured” Files EXTRACT Expression @s = EXTRACT a string, b int FROM "filepath/file.csv" USING Extractors.Csv(encoding: Encoding.Unicode); • Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet • Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io) OUTPUT Expression OUTPUT @s TO "filepath/file.csv" USING Outputters.Csv(); • Built-in Outputters: Csv, Tsv, Text, Parquet • Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io) Filepath URIs • Relative URI to default ADL Storage account: "filepath/file.csv" • Absolute URIs: • ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv" • WASB: "wasb://container@account/filepath/file.csv"
  • 31. • Simple Patterns • Virtual Columns • Only on EXTRACT GA for now • OUTPUT in Private Preview File Sets Simple pattern language on filename and path @pattern string = "/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}"; • Binds two columns date and suffix • Wildcards the filename • Limits on number of files and file sizes can be improved with SET @@FeaturePreviews = "FileSetV2Dot5:on,InputFileGrouping:on, AsyncCompilerStoreAccess:on"; (Will become default between now and middle of year) Virtual columns EXTRACT name string , suffix string // virtual column , date DateTime // virtual column FROM @pattern USING Extractors.Csv(); • Refer to virtual columns in predicates to get partition elimination • Warning gets raised if no partition elimination was found
  • 35. @rows = SELECT Domain, SUM(Clicks) AS TotalClicks FROM @ClickData GROUP BY Domain;
  • 36. Read Read Partition Partition Full Agg Write Full Agg Write Full Agg Write Read Partition Partial Agg Partial Agg Partial Agg CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH U-SQL Table Distributed by Domain Read Read Full Agg Full Agg Write Write Read Full Agg Write FB EXTENT 1 WH EXTENT 2 CNN EXTENT 3 Expensive!
  • 37. ADLA Account/Catalog Database Schema [1,n] [1,n] [0,n] tables views TVFs C# Fns C# UDAgg Clustered Index partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters Ext. tables User objects Refers toContains Implemented and named by Procedures Creden- tials MD Name C# Name C# Applier Table Types Legend Statistics C# UDTs Packages
  • 38. • Naming • Discovery • Sharing • Securing U-SQL Catalog Naming • Default Database and Schema context: master.dbo • Quote identifiers with []: [my table] • Stores data in ADL Storage /catalog folder Discovery • Visual Studio Server Explorer • Azure Data Lake Analytics Portal • SDKs and Azure Powershell commands • Catalog Views: usql.databases, usql.tables etc. Sharing • Within an Azure Data Lake Analytics account • Across ADLA accounts that share same Azure Active Directory: • Referencing Assemblies • Calling TVFs, Procedures and referencing tables and views • Inserting into tables Securing • Secured with AAD principals at catalog and Database level
  • 39. CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITION BY (col1) DISTRIBUTED BY HASH (driver_id) ); • Structured Data, built-in Data types only (no UDTs) • Clustered Index (needs to be specified): row-oriented • Fine-grained distribution (needs to be specified): • HASH, DIRECT HASH, RANGE, ROUND ROBIN • Addressable Partitions (optional) CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); • Infer the schema from the query • Still requires index and distribution (does not support partitioning)
  • 40. Data Partitioning Tables Distribution Scheme When to use? HASH(keys) Automatic Hash for fast item lookup DIRECT HASH(id) Exact control of hash bucket value RANGE(keys) Keeps ranges together ROUND ROBIN To get equal distribution (if others give skew)
  • 41. Partitions, Distributions and Clusters TABLE T ( id … , C … , date DateTime, … , INDEX i CLUSTERED (id, C) PARTITIONED BY (date) DISTRIBUTED BY HASH(id) INTO 4 ) PARTITION (@date1) PARTITION (@date2) PARTITION (@date3) HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 1 HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 4 HASH DISTRIBUTION 3 C1 C2 C3 C1 C2 C4 C5 C4 C6 C6 C7 C8 C7 C5 C6 C9 C10 C1 C3 /catalog/…/tables/Guid(T)/ Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss LOGICAL PHYSICAL
  • 42. Benefits of Table clustering and distribution • Faster lookup of data provided by distribution and clustering when right distribution/cluster is chosen • Data distribution provides better localized scale out • Used for filters, joins and grouping Benefits of Table partitioning • Provides data life cycle management (“expire” old partitions): Partition on date/time dimension • Partial re-computation of data at partition level • Query predicates can provide partition elimination Do not use when… • No filters, joins and grouping • No reuse of the data for future queries If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
  • 43. Benefits of Distribution in Tables Benefits • Design for most frequent/costly queries • Manage data skew in partition/table • Manage parallelism in querying (by number of distributions) • Manage minimizing data movement in joins • Provide distribution seeks and range scans for query predicates (distribution bucket elimination) Distribution in tables is mandatory, chose according to desired benefits
  • 44. Benefits of Clustered Index in Distribution Benefits • Design for most frequent/costly queries • Manage data skew in distribution bucket • Provide locality of same data values • Provide seeks and range scans for query predicates (index lookup) Clustered index in tables is mandatory, chose according to desired benefits Pro Tip: Distribution keys should be prefix of Clustered Index keys: Especially for RANGE distribution Optimizer will make use of global ordering then: If you make the RANGE distribution key a prefix of the index key, U-SQL will repartition on demand to align any UNIONALLed or JOINed tables or partitions! Split points of table distribution partitions are chosen independently, so any partitioned table can do UNION ALL in this manner if the data is to be processed subsequently on the distribution key.
  • 45. Benefits of Partitioned Tables Benefits • Partitions are addressable • Enables finer-grained data lifecycle management at partition level • Manage parallelism in querying by number of partitions • Query predicates provide partition elimination • Predicate has to be constant-foldable Use partitioned tables for • Managing large amounts of incrementally growing structured data • Queries with strong locality predicates • point in time, for specific market etc • Managing windows of data • provide data for last x months for processing
  • 46. Partitioned tables  Use partitioned tables for querying parts of large amounts of incrementally growing structured data  Get partition elimination optimizations with the right query predicates Creating partition table CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float , INDEX idx CLUSTERED (vehicle_id ASC) PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4); Creating partitions DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc); DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc); ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2); Loading data into partitions dynamically DECLARE @date1 DateTime = DateTime.Parse("2014-09-14"); DECLARE @date2 DateTime = DateTime.Parse("2014-09-16"); INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE SELECT vehicle_id, event_date, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, ignore “dirty” data Loading data into partitions statically ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate); INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate SELECT vehicle_id, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, put “dirty” data into special partition
  • 47. What is Table Fragmentation • ADLS is an append-only store! • Every INSERT statement is creating a new file (INSERT fragment) Why is it bad? • Every INSERT fragment contains data in its own distribution buckets, thus query processing loses ability to get “localized” fast access • Query generation has to read from many files now -> slow preparation phase that may time out. • Reading from too many files is disallowed: Current LIMIT: 3000 table partitions and INSERT fragments per job! What if I have to add data incrementally? • Batch inserts into table • Use ALTER TABLE REBUILD/ALTER TABLE REBUILD PARTITION regularly to reduce fragmentation and keep performance.
  • 49. Dips down to 1 active vertex at these times
  • 53. High-level Roadmap • Worldwide Region Availability (currently US and EU) • Interactive Access with T-SQL query • Scale out your custom code in the language of choice (.Net, Java, Python, etc) • Process the data formats of your choice (incl. Parquet, ORC; larger string values) • Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS integration • Administrative policies to control usage/cost for storage & compute • Secure data sharing between common AAD and public read-only sharing, fine grained ACLing • Intense focus on developer productivity for authoring, debugging, and optimization • General customer feedback http://aka.ms/adlfeedback
  • 55. Resources http://usql.io http://blogs.msdn.microsoft.com/azuredatalake/ http://blogs.msdn.microsoft.com/mrys/ https://channel9.msdn.com/Search?term=U-SQL#ch9Search http://aka.ms/usql_reference https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u- sql-programmability-guide https://docs.microsoft.com/en-us/azure/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 https://msdn.microsoft.com/magazine/mt790200 http://www.slideshare.net/MichaelRys Getting Started with R in U-SQL https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u- sql-python-extensions https://social.msdn.microsoft.com/Forums/azure/en- US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql http://aka.ms/adlfeedback Continue your education at Microsoft Virtual Academy online.

Editor's Notes

  • #2: IMPORTANT NOTE: Microsoft Ready will utilize Windows 10 and the new PowerPoint 2016, as well as PowerPoint 2013 on all event machines. Please build your slides utilizing the appropriate Template and utilize the version of PowerPoint that works best.   Windows 10 devices now connect remotely through VPN using the PIN with Passport for Work. Speakers needing to use VPN in a demo, will need to have registered for Phone Authentication at https://phoneregistration.microsoft.com. Additional details are also available here: https://microsoft.sharepoint.com/sites/itweb/securelogon/Pages/FAQ.aspx
  • #3: This slide is required. Do NOT delete. This should be the first slide after your Title Slide. If you have questions, please contact your Track PM for guidance. We have also posted guidance on writing good objectives, out on the Speaker Portal (https://www.microsoftready.com).   Please Note: Key Takeaways are not required for Group Discussions (formerly known as Chalk Talks) or Workshop sessions, as takeaways will be developed real time by the facilitator who should summarize the key points made by them/the audience during the discussion or workshop. This slide should introduce the session by identifying how this information helps the attendee, partners and customers be more successful. Why is this content important? This slide should call out what’s important about the session (sort of the why should we care, why is this important and how will it help our customers/partners be successful) as well as the key takeaways/objectives associated with the session. Call out what attendees will be able to execute on using the information gained in this session. What will they be able to walk away from this session and execute on with their customers. Good Objectives should be SMART (specific, measurable, achievable, realistic, time-bound). Focus on the key takeaways and why this information is important to the attendee, our partners and our customers. Each session has objectives defined and published on www.microsoftready.com, please work with your Track PM to call these out here in the slide deck. If you have questions, please contact your Track PM.
  • #30: Shows simple Extract, OUTPUT on preview large file and many small files to introduce fast file set Then simple extensibility with string functions Dynamic output
  • #35: Par
  • #36: Show Views, TVFs and Tables
  • #50: https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables
  • #56: This slide is recommended as a final slide to recap the objectives of the session to remind attendees what you said would be covered and to highlight that you did indeed cover those points. LEARNING OBJECTIVES: Match the objectives covered on the required Objective slide at the beginning of your presentation. KEY TAKEAWAYS: Bullet points highlighting the primary information sellers should be able to recall from the session to help them perform in role (e.g. practical guidance, tips, suggested behavior changes.) Please Note: Key Takeaways are not required to be noted in this slide for Group Discussions (formerly known as Chalk Talks) or Workshop sessions; however, takeaways should be captured real time by the facilitator during the session and those key points should be emphasized during the discussion or workshop. ACTION ITEMS: Next steps to put their learnings into action. If you have questions, please contact your Track PM.
  • #58: Display this slide during session Q&A and direct attendees to use the Q&A microphone located in the session room: Digital Ready session recordings cannot capture Q&A unless it is spoken using the microphone Attendees in the back of the room may not be able to hear a question from someone in the front of the room SPEAKERS MUST REPEAT THE QUESTIONS IF THE ATTENDEE IS NOT USING THE Q&A MICROPHONE