SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Does SQL
“Top 5”s
Surprises for SQL
Users
AS is not as
• C# keywords and SQL keywords overlap
• Costly to make case-insensitive -> Better
build capabilities than tinker with syntax
= != ==
• Remember: C# expression language
null IS NOT NULL
• C# nulls are two-valued
PROCEDURES but no WHILE
No UPDATE nor MERGE
• Transform/Recook instead
U-SQL Does SQL (SQLBits 2016)
@customers =
SELECT Customer.ToUpper() AS Customer
FROM @orders
WHERE Customer.Contains("Contoso");
C# Expression
Transforming Rowsets
C# Expression
Use WHERE for filtering
@rows =
SELECT
Customer,
SUM(Amount) AS TotalAmount
FROM @orders
GROUP BY Customer;
Many other aggregations are
possible. You can define
your own aggregator with
C#!
Grouping & Aggregation
@rows =
SELECT
Customer,
SUM(Amount) AS TotalAmount
FROM @orders
GROUP BY Customer
HAVING TotalAmount > 1000000;
HAVING filters the output of
a GROUP BY
Grouping & Aggregation (2)
Sorting a rowset
@customers
SELECT *
FROM @customers
ORDER BY Amount ASC
FETCH FIRST 3 ROWS;
SELECT with ORDER BY
requires a FETCH FIRST!
Sorting on OIUTPUT
OUTPUT @customers
TO @"/output.tsv"
ORDER BY Amount ASC
USING Outputters.Tsv();
Creating Constant Rowsets in Script
@departments =
SELECT * FROM
(VALUES
(31, "Sales"),
(33, "Engineering"),
(34, "Clerical"),
(35, "Marketing")
) AS
D( DepID, DepName );
@m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs
FROM @t;
@t = SELECT author, "authored" AS category
FROM @t
UNION ALL
SELECT r.Substring(1) AS r, "referenced" AS category
FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r);
category,
, category
@m(refs)
@me, @you
@him, @her
Refs(r)
@me
@you
@him
@her
@me, @you
@me
@you
U-SQL
Joins
Join operators
• INNER JOIN
• LEFT or RIGHT or FULL OUTER JOIN
• CROSS JOIN
• SEMIJOIN
• equivalent to IN subquery
• ANTISEMIJOIN
• Equivalent to NOT IN subquery
Notes
• ON clause comparisons need to be of the simple form:
rowset.column == rowset.column
or AND conjunctions of the simple equality comparison
• If a comparand is not a column, wrap it into a column in a previous
SELECT
• If the comparison operation is not ==, put it into the WHERE clause
• turn the join into a CROSS JOIN if no equality comparison
Reason: Syntax calls out which joins are efficient
U-SQL
Analytics
Windowing Expression
Window_Function_Call 'OVER' '('
[ Over_Partition_By_Clause ]
[ Order_By_Clause ]
[ Row _Clause ]
')'.
Window_Function_Call :=
Aggregate_Function_Call
| Analytic_Function_Call
| Ranking_Function_Call.
Windowing Aggregate Functions
ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP
Analytics Functions
CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT,
PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG
Ranking Functions
DENSE_RANK, NTILE, RANK, ROW_NUMBER
U-SQL Does SQL (SQLBits 2016)
INSERT
• INSERT constant
values
• INSERT from
queries
• Multiple INSERTs
INSERT constant values
INSERT INTO T VALUES (1, "text",
new SQL.MAP<string,string>("key","value"));
INSERT from queries
INSERT INTO T SELECT col1, col2, col3 FROM
@rowset;
Multiple INSERTs into same table
• Is supported
• Generates separate file per insert in physical storage:
• Can lead to performance degradation
• Recommendations:
• Try to avoid small inserts
• Rebuild table after frequent insertions with:
ALTER TABLE T REBUILD;
Additional
Resources
Documentation
U-SQL Reference Doc: http://aka.ms/usql_reference
Sample Projects
https://github.com/Azure/usql/tree/master/Examples/Ambulan
ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data
https://github.com/Azure/usql/tree/master/Examples/TweetAn
alysis
http://aka.ms/AzureDataLake

More Related Content

PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
U-SQL - Azure Data Lake Analytics for Developers
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
ADL/U-SQL Introduction (SQLBits 2016)
U-SQL - Azure Data Lake Analytics for Developers
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Killer Scenarios with Data Lake in Azure with U-SQL
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

What's hot (20)

PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
PPTX
U-SQL Query Execution and Performance Tuning
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
PPTX
U-SQL Intro (SQLBits 2016)
PPTX
Using C# with U-SQL (SQLBits 2016)
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PPTX
Stored procedure tuning and optimization t sql
PPTX
MS SQL SERVER: Programming sql server data mining
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
PDF
esProc introduction
PDF
Advanced SQL For Data Scientists
PPTX
Be A Hero: Transforming GoPro Analytics Data Pipeline
PPTX
Discardable In-Memory Materialized Queries With Hadoop
PDF
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
PDF
What's new in Mondrian 4?
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
U-SQL Query Execution and Performance Tuning
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Stored procedure tuning and optimization t sql
MS SQL SERVER: Programming sql server data mining
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
esProc introduction
Advanced SQL For Data Scientists
Be A Hero: Transforming GoPro Analytics Data Pipeline
Discardable In-Memory Materialized Queries With Hadoop
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
What's new in Mondrian 4?
Ad

Viewers also liked (11)

PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
U-SQL Learning Resources (SQLBits 2016)
PPTX
Introducing U-SQL (SQLPASS 2016)
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PPTX
Azure Data Lake and U-SQL
PPTX
Microsoft's Hadoop Story
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
Azure Data Lake Analytics Deep Dive
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Introducing U-SQL (SQLPASS 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Azure Data Lake and U-SQL
Microsoft's Hadoop Story
Analyzing StackExchange data with Azure Data Lake
Azure Data Lake Analytics Deep Dive
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ad

Similar to U-SQL Does SQL (SQLBits 2016) (20)

PPTX
3 CityNetConf - sql+c#=u-sql
PPTX
1. dml select statement reterive data
PPT
Transact SQL (T-SQL) for Beginners (A New Hope)
PDF
Sql ch 5
PDF
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
ODP
Oracle SQL Advanced
PDF
Tech Jam 01 - Database Querying
PPTX
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
PDF
45 Essential SQL Interview Questions
PPTX
More Complex SQL and Concurrency ControlModule 4.pptx
PPT
Advanced Sql Training
PPTX
Subqueries, Backups, Users and Privileges
PPTX
Intro to t sql – 3rd session
PPTX
Database Management System - SQL Advanced Training
PPTX
MS Access Ch 2 PPT
PDF
MODULE 1.pdf foundations of data science for final
PPTX
Understanding Structured Query Language fundamentals
PPT
lecture2.ppt
PPT
lecture2.ppt
PDF
Dynamic websites lec2
3 CityNetConf - sql+c#=u-sql
1. dml select statement reterive data
Transact SQL (T-SQL) for Beginners (A New Hope)
Sql ch 5
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
Oracle SQL Advanced
Tech Jam 01 - Database Querying
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
45 Essential SQL Interview Questions
More Complex SQL and Concurrency ControlModule 4.pptx
Advanced Sql Training
Subqueries, Backups, Users and Privileges
Intro to t sql – 3rd session
Database Management System - SQL Advanced Training
MS Access Ch 2 PPT
MODULE 1.pdf foundations of data science for final
Understanding Structured Query Language fundamentals
lecture2.ppt
lecture2.ppt
Dynamic websites lec2

More from Michael Rys (10)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data Processing with .NET and Spark (SQLBits 2020)
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Mega Projects Data Mega Projects Data
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Introduction to Data Science and Data Analysis
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
SAP 2 completion done . PRESENTATION.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
ISS -ESG Data flows What is ESG and HowHow
Clinical guidelines as a resource for EBP(1).pdf
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Mega Projects Data Mega Projects Data
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IB Computer Science - Internal Assessment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Data Science and Data Analysis
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IBA_Chapter_11_Slides_Final_Accessible.pptx
[EN] Industrial Machine Downtime Prediction
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

U-SQL Does SQL (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Does SQL
  • 2. “Top 5”s Surprises for SQL Users AS is not as • C# keywords and SQL keywords overlap • Costly to make case-insensitive -> Better build capabilities than tinker with syntax = != == • Remember: C# expression language null IS NOT NULL • C# nulls are two-valued PROCEDURES but no WHILE No UPDATE nor MERGE • Transform/Recook instead
  • 4. @customers = SELECT Customer.ToUpper() AS Customer FROM @orders WHERE Customer.Contains("Contoso"); C# Expression Transforming Rowsets C# Expression Use WHERE for filtering
  • 5. @rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer; Many other aggregations are possible. You can define your own aggregator with C#! Grouping & Aggregation
  • 6. @rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000; HAVING filters the output of a GROUP BY Grouping & Aggregation (2)
  • 7. Sorting a rowset @customers SELECT * FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS; SELECT with ORDER BY requires a FETCH FIRST!
  • 8. Sorting on OIUTPUT OUTPUT @customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();
  • 9. Creating Constant Rowsets in Script @departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );
  • 10. @m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t; @t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r); category, , category @m(refs) @me, @you @him, @her Refs(r) @me @you @him @her @me, @you @me @you
  • 11. U-SQL Joins Join operators • INNER JOIN • LEFT or RIGHT or FULL OUTER JOIN • CROSS JOIN • SEMIJOIN • equivalent to IN subquery • ANTISEMIJOIN • Equivalent to NOT IN subquery Notes • ON clause comparisons need to be of the simple form: rowset.column == rowset.column or AND conjunctions of the simple equality comparison • If a comparand is not a column, wrap it into a column in a previous SELECT • If the comparison operation is not ==, put it into the WHERE clause • turn the join into a CROSS JOIN if no equality comparison Reason: Syntax calls out which joins are efficient
  • 12. U-SQL Analytics Windowing Expression Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ] [ Order_By_Clause ] [ Row _Clause ] ')'. Window_Function_Call := Aggregate_Function_Call | Analytic_Function_Call | Ranking_Function_Call. Windowing Aggregate Functions ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP Analytics Functions CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG Ranking Functions DENSE_RANK, NTILE, RANK, ROW_NUMBER
  • 14. INSERT • INSERT constant values • INSERT from queries • Multiple INSERTs INSERT constant values INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value")); INSERT from queries INSERT INTO T SELECT col1, col2, col3 FROM @rowset; Multiple INSERTs into same table • Is supported • Generates separate file per insert in physical storage: • Can lead to performance degradation • Recommendations: • Try to avoid small inserts • Rebuild table after frequent insertions with: ALTER TABLE T REBUILD;
  • 15. Additional Resources Documentation U-SQL Reference Doc: http://aka.ms/usql_reference Sample Projects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis