SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
Using C# with U-SQL
2016/04/04
Extensible From
Ground Up
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and
C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the
Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER,
PROCESSOR, REDUCER, COMBINER,
APPLIER
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyClass.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Managing
Assemblies
Create assemblies
Reference assemblies
Enumerate assemblies
Drop assemblies
• CREATE ASSEMBLY db.assembly FROM @path;
• CREATE ASSEMBLY db.assembly FROM byte[];
• Can also include additional resource files
• REFERENCE ASSEMBLY db.assembly;
• Referencing .Net Framework Assemblies
• Always accessible system namespaces:
• U-SQL specific (e.g., for SQL.MAP)
• All provided by system.dll system.core.dll
system.data.dll, System.Runtime.Serialization.dll,
mscorelib.dll (e.g., System.Text,
System.Text.RegularExpressions, System.Linq)
• Add all other .Net Framework Assemblies with:
REFERENCE SYSTEM ASSEMBLY [System.XML];
• Enumerating Assemblies
• Powershell command
• U-SQL Studio Server Explorer
• DROP ASSEMBLY db.assembly;
Assembly
Dependencies • Assembly must be registered to be
referenced
• All Assemblies needed for compilation must
be referenced in script
• All Assemblies needed at runtime either
• Need to be referenced in script, or
• Need to be registered with the assembly
as additional files
• Metadata Service does NOT enforce
dependencies
• Visual Studio Extension provides support for
dependency management
Additional
Resources
MSDN Article
https://msdn.microsoft.com/en-
us/magazine/mt614251
Sample Data
https://github.com/Azure/usql/tree/master/Exampl
es/Samples/Data/Tweets
Sample Project
https://github.com/Azure/usql/tree/master/Exampl
es/TweetAnalysis
http://aka.ms/AzureDataLake
Ad

Recommended

PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
PPTX
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
PPTX
U-SQL Intro (SQLBits 2016)
Michael Rys
 
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PDF
Spark SQL with Scala Code Examples
Todd McGrath
 
PPTX
Introduction to HiveQL
kristinferrier
 
PPTX
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
PDF
Cubes – pluggable model explained
Stefan Urbanek
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 

More Related Content

What's hot (20)

PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PDF
Spark SQL with Scala Code Examples
Todd McGrath
 
PPTX
Introduction to HiveQL
kristinferrier
 
PPTX
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
PDF
Cubes – pluggable model explained
Stefan Urbanek
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
Microsoft's Hadoop Story
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
Spark SQL with Scala Code Examples
Todd McGrath
 
Introduction to HiveQL
kristinferrier
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
Cubes – pluggable model explained
Stefan Urbanek
 

Viewers also liked (7)

PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
PPTX
Azure Data Lake and U-SQL
Michael Rys
 
PPTX
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
PPTX
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
PPTX
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
Azure Data Lake and U-SQL
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Ad

Similar to Using C# with U-SQL (SQLBits 2016) (20)

PPTX
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
PPTX
C# + SQL = Big Data
Sascha Dittmann
 
PPTX
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
PPTX
Microsoft Azure Big Data Analytics
Mark Kromer
 
PDF
Talavant Data Lake Analytics
Sean Forgatch
 
PPTX
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Tom Kerkhove
 
PPTX
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
PPTX
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Tom Kerkhove
 
PPTX
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
PDF
USQ Landdemos Azure Data Lake
Trivadis
 
PPTX
PATTERNS07 - Data Representation in C#
Michael Heron
 
PPTX
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
PPTX
Paris Datageeks meetup 05102016
Michel Caradec
 
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
DOCX
MCS,BCS-7(A,B) Visual programming Syllabus for Final exams @ ISP
Ali Shah
 
PPTX
ORM - Ivan Marković
Software StartUp Academy Osijek
 
PDF
An introduction to_application_development_in_ibm_db2_udb_using_microsoft_vis...
ANIL MAHADEV
 
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
C# + SQL = Big Data
Sascha Dittmann
 
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Microsoft Azure Big Data Analytics
Mark Kromer
 
Talavant Data Lake Analytics
Sean Forgatch
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Tom Kerkhove
 
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Tom Kerkhove
 
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
USQ Landdemos Azure Data Lake
Trivadis
 
PATTERNS07 - Data Representation in C#
Michael Heron
 
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
Paris Datageeks meetup 05102016
Michel Caradec
 
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
MCS,BCS-7(A,B) Visual programming Syllabus for Final exams @ ISP
Ali Shah
 
ORM - Ivan Marković
Software StartUp Academy Osijek
 
An introduction to_application_development_in_ibm_db2_udb_using_microsoft_vis...
ANIL MAHADEV
 
Ad

More from Michael Rys (7)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 

Recently uploaded (20)

PPTX
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
PPTX
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
PPTX
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
PDF
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
PPTX
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PDF
Measurecamp Copenhagen - Consent Context
Human37
 
PPTX
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
PPTX
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
PDF
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
PPTX
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
PDF
All the DataOps, all the paradigms .
Lars Albertsson
 
PPTX
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
PPTX
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
PDF
Residential Zone 4 for industrial village
MdYasinArafat13
 
PPTX
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Measurecamp Copenhagen - Consent Context
Human37
 
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
All the DataOps, all the paradigms .
Lars Albertsson
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Residential Zone 4 for industrial village
MdYasinArafat13
 
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 

Using C# with U-SQL (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com Using C# with U-SQL 2016/04/04
  • 2. Extensible From Ground Up • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyClass.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write();
  • 6. Managing Assemblies Create assemblies Reference assemblies Enumerate assemblies Drop assemblies • CREATE ASSEMBLY db.assembly FROM @path; • CREATE ASSEMBLY db.assembly FROM byte[]; • Can also include additional resource files • REFERENCE ASSEMBLY db.assembly; • Referencing .Net Framework Assemblies • Always accessible system namespaces: • U-SQL specific (e.g., for SQL.MAP) • All provided by system.dll system.core.dll system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq) • Add all other .Net Framework Assemblies with: REFERENCE SYSTEM ASSEMBLY [System.XML]; • Enumerating Assemblies • Powershell command • U-SQL Studio Server Explorer • DROP ASSEMBLY db.assembly;
  • 7. Assembly Dependencies • Assembly must be registered to be referenced • All Assemblies needed for compilation must be referenced in script • All Assemblies needed at runtime either • Need to be referenced in script, or • Need to be registered with the assembly as additional files • Metadata Service does NOT enforce dependencies • Visual Studio Extension provides support for dependency management

Editor's Notes

  • #4: Shows simple Extract, OUTPUT Then simple extensibility with string functions.
  • #5: Shows simple Extract, OUTPUT Then simple extensibility with string functions.
  • #6: Shows simple Extract, OUTPUT Then simple extensibility with string functions.