SlideShare a Scribd company logo
Analyzing StackExchange
data with Azure Data Lake
Tom Kerkhove
Azure Consultant
Tom Kerkhove
Azure Consultant @ Codit
Microsoft Azure MVP & Advisor
“Integration of Things” whitepaper (https://bit.ly/azure-iot)
Nice to meet you
blog.tomkerkhove.be
@TomKerkhove
tomkerkhove
Agenda
• Introduction to Azure Data Lake
• What is Azure Data Lake Store?
• What is Azure Data Lake Analytics?
3
4
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Let’s go open-source, right?!
➔ Comes with a few challenges for C#/SQL professional
➔ New languages to learn & maintain
➔ Rapidly evolving ecosystem
➔ Cluster management
➔ Typically linux machines
Analyzing Big Data in Azure
➔ WebHDFS compatible
➔ Any size
➔ Any format as-is
➔ Write-once-read-many
➔ Enterprise-grade security
➔ Thé big data store in Azure
Azure Data Lake Store
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Characteristics
➔ Data Warehousing
➔ Structured data
➔ Defined set of schemas
➔ Requires Extract-Transform-
Load (ETL) before storing
➔ Known for some of us
➔ Exploratory analysis is hard
because of transforming
the data
Data Warehousing vs Data Lakes
➔ Data Lakes
➔ Raw data
(unstructured/semi-structured/structured)
➔ “Dump” all your data in the lake
➔ Data scientists will interpret data
from the lake
➔ Without metadata, turns in a data
swamp pretty fast
Martin Fowler on Data Lake & Data Warehouses: https://bit.ly/martin-fowler-data-lake
Security
➔ Roled-based Access Control (RBAC)
➔ Grant user/groups access to folder/file
(https://bit.ly/data-lake-store-acls)
➔ Firewall (off by default)
➔ Encryption at rest
➔ Keys managed by Microsoft
➔ Bring-your-own-key with Azure Key Vault
➔ ~$0,032/GB stored per month
➔ Transaction costs
➔ ~$0,043 per 1M write transactions
➔ ~$0,0034 per 1M read transactions
➔ 1 transaction is block of up to 128 kB
➔ Regular Egress fees
➔ Monthly commitment packages
➔ Save up to 33%
Pricing
Azure Data Lake Store vs Blob Storage
No Limitations
Store whatever you
want in any format
Security
Built-in Azure Active
Directory support
Pricing
More expensive than
Storage GRS
Redundancy
It’s there but no control
over it
Built for Scale
Optimized for high-
scale reads
Integration
With Data Factory, Data
Catalog & HDInsight
Full comparison on https://bit.ly/adls-vs-storage
Demo – Data Lake Store
15
Meet StackExchange
➔ Over 280 websites
➔ 150+ GB of open-source data
➔ Different kinds of data
➔ Posts
➔ Users
➔ Votes
➔ ...
➔ A big data sample data set
What Are We Going To Do?
• Download the
original data set
Acquiring The
Data
• Upload data set to
Azure
• Determine what
service to use
Moving The
Data • Merging data from
each site into one file
• Conversion from XML
to CSV
Aggregating
The Data
• Run business logic on
it
• Attempt to gain
knowledge from it
Analyzing The
Data • Visualize what we’ve
learned
Visualizing The
Data
How is it setup?
Azure Data Lake Analytics
➔ Run analytics jobs on managed clusters
➔ No maintenance ~ Serverless
➔ Written in U-SQL
➔ SQL Syntax
➔ Extensibility in C#
➔ Easily scaled with Analytics Units
➔ Pay for processing time only
➔ Built-in partitioned tables
➔ Query data where it lives
➔ No need to prepare data
➔ One query that runs on multiple
data stores
➔ Use the correct data store
for the job
Data Sources
Writing U-SQL scripts
Extract from data source by
using built-in or custom
extractors.
Transform / Analyse the data
using SQL-syntax, in-line C#
or C# method calls
Output the result to a data
source by using built-in or
custom extractors
➔ C# Expressions
➔ User-Defined Functions (UDF)
➔ User-Defined Operations (UDO)
➔ User-Defined Aggregators (UDAGG)
Extensibility
➔ User-Defined Extractors
➔ User-Defined Processors
➔ Take one row and produce
one row
➔ Pass-through versus
transforming
➔ User-Defined Reducers
➔ Take n rows and produce 1
row
➔ User-Defined Outputters
➔ User-Defined Appliers
➔ Take one row and produce 0 to
n rows
➔ Used with OUTER/CROSS
APPLY
➔ User-Defined Combiners
➔ Combines rowsets (like a user-
defined join)
User-Defined Operations (UDO)
Metadata Model
U-SQL Batch Job Execution Lifetime
Michael Rys on “Tuning & Optimizing U-SQL” https://bit.ly/tuning-optimizing-u-sql
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Job States
➔ Roled-based Access Control (RBAC)
➔ Firewall (Off by default)
➔ Access control on service catalog
➔ Access control on a per-database level
Security
➔ Account-level limitations
➔ Maximum of AUs
➔ Maximum of concurrent job
➔ Days to retain queries
➔ Job-level limitations
➔ Maximum of AUs
➔ Maximum priority
➔ Granted per user and/or group
Resource Management
Demo – Data Lake Analytics
30
➔ Store Explorer
➔ Browse store
➔ Download complete / subset of file
➔ Preview
➔ Only in Visual Studio
➔ Job Visualizer
➔ Determine bottlenecks by using heatmaps
➔ Playback jobs based on telemetry
➔ Query optimization
➔ Job Profiler
Azure Data Lake tools for Visual Studio
➔ Integration with Source control
➔ Unit Testing extensibility
➔ Local execution
➔ Simulate Data Lake Store
➔ Run & debug jobs
Azure Data Lake tools for Visual Studio (Code)
➔ Billed for processing time, not per job
➔ Billed per second
➔ $1,687 per hour per Analytics Unit
➔ ~ $0,028 per minute
➔ Monthly commitment packages
➔ Save up to 74%
Pricing
Operations
Data Lake Store Data Lake Analytics
Available Graphs
• Storage Utilization
• Read & Write
• Ingress & Egress
• Job status
• Used # of AU time
Available Metrics
• Data Read & Write
• Read & Write Requests
• Total Storage
• Job status
• Used # of AU time
Support for alerts Yes Built-in & custom Log Analytics queries
(Requires Audit logs)
Support for Audit Logs Yes Yes
Support for Request Logs Yes Yes
➔ Integrate with your data pipelines in Azure Data Factory
➔ Move data from Azure Data Lake Store to other store
➔ Move data to Azure Data Lake Store
➔ Run U-SQL jobs within pipeline
➔ Integration with Azure Data Catalog
➔ Register your Azure Data Lake Store assets
Integration with Azure Services
➔ Azure Data Architecture Guide
(https://docs.microsoft.com/en-us/azure/architecture/data-guide/)
➔ “Mastering Azure Analytics” by Zoiner Tejada
(https://bit.ly/mastering-azure-analytics)
➔ MVA “Introducing Azure Data Lake”
(https://bit.ly/intro-to-azure-data-lake)
➔ Azure Data Lake GitHub Repo
(https://azure.github.io/AzureDataLake/)
➔ U-SQL Documentation
(https://usql.io)
Learn more!
➔ Big Data is not just a hype so get ready
➔ Azure Data Lake Store
➔ Analyse today & explore tomorrow
➔ Beware of the data swamps
➔ Data Lake Analytics
➔ Serverless
➔ Re-use existing skills
➔ Pay for what we use
➔ Big Data in Azure? Use Azure Data Lake!
Summary
38

More Related Content

PPTX
NDC Sydney - Analyzing StackExchange with Azure Data Lake
PDF
Cassandra in e-commerce
PDF
Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassan...
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
PPTX
Azure Big Data Story
PPTX
Azure DocumentDB 101
PDF
Log analytics with ELK stack
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Cassandra in e-commerce
Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassan...
Integration Monday - Analysing StackExchange data with Azure Data Lake
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Azure Big Data Story
Azure DocumentDB 101
Log analytics with ELK stack

What's hot (20)

PDF
What's new in MongoDB 2.6 at India event by company
PDF
Azure SQL Data Warehouse
PDF
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
PPTX
Webinar : Nouveautés de MongoDB 3.2
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
Azure SQL Data Warehouse for beginners
PPTX
Introduction to Azure DocumentDB
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPTX
Move your on prem data to a lake in a Lake in Cloud
PPTX
Data Modeling Basics for the Cloud with DataStax
PPTX
From PoCs to Production
PPTX
SQL Server R Services: What Every SQL Professional Should Know
PDF
Azure Data Factory v2
PPTX
Survey of the Microsoft Azure Data Landscape
PPTX
Database Choices
PDF
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
PPTX
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
PPTX
Elastic Stack Introduction
PPTX
Elasticsearch 5.0
PDF
Elastic 6.1 Feature Presentation
What's new in MongoDB 2.6 at India event by company
Azure SQL Data Warehouse
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
Webinar : Nouveautés de MongoDB 3.2
Analyzing StackExchange data with Azure Data Lake
Azure SQL Data Warehouse for beginners
Introduction to Azure DocumentDB
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Move your on prem data to a lake in a Lake in Cloud
Data Modeling Basics for the Cloud with DataStax
From PoCs to Production
SQL Server R Services: What Every SQL Professional Should Know
Azure Data Factory v2
Survey of the Microsoft Azure Data Landscape
Database Choices
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Elastic Stack Introduction
Elasticsearch 5.0
Elastic 6.1 Feature Presentation
Ad

Similar to NDC Minnesota - Analyzing StackExchange data with Azure Data Lake (20)

PPTX
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
PPTX
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
PPTX
Ai big dataconference_eugene_polonichko_azure data lake
PPTX
Azure Data Lake and Azure Data Lake Analytics
PDF
Talavant Data Lake Analytics
PDF
Introduction to Azure Data Lake
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
PDF
USQ Landdemos Azure Data Lake
PPTX
Azure data lake sql konf 2016
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
Designing big data analytics solutions on azure
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PPTX
An intro to Azure Data Lake
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
Microsoft Azure Big Data Analytics
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
PPTX
Introducing U-SQL (SQLPASS 2016)
PPTX
Tokyo azure meetup #2 big data made easy
PDF
Complete Guide to Microsoft Azure Data Lake.pdf
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Ai big dataconference_eugene_polonichko_azure data lake
Azure Data Lake and Azure Data Lake Analytics
Talavant Data Lake Analytics
Introduction to Azure Data Lake
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
USQ Landdemos Azure Data Lake
Azure data lake sql konf 2016
Azure Data Lake Intro (SQLBits 2016)
Designing big data analytics solutions on azure
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
An intro to Azure Data Lake
ADL/U-SQL Introduction (SQLBits 2016)
Microsoft Azure Big Data Analytics
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Introducing U-SQL (SQLPASS 2016)
Tokyo azure meetup #2 big data made easy
Complete Guide to Microsoft Azure Data Lake.pdf
Ad

More from Tom Kerkhove (20)

PPTX
Techorama 2022 - Adventures of building Promitor, an open-source product
PPTX
Microsoft Partners - Application Autoscaling Made Easy With Kubernetes Event-...
PPTX
Introduction to Promitor
PPTX
Azure Lowlands 2020 - API management for microservices in a hybrid and multi-...
PPTX
NDC London 2021 - Application Autoscaling Made Easy With Kubernetes Event-Dri...
PPTX
Global Azure Virtual - Application Autoscaling with KEDA
PPTX
Building Bruges 2020 - Adventures of building a multi-tenant PaaS on Microsof...
PPTX
AZUG Lightning Talk - Application autoscaling on Kubernetes with Kubernetes E...
PPTX
IglooConf 2020 - API management for microservices in a hybrid and multi-cloud...
PPTX
IglooConf 2020 - Adventures of building a multi-tenant PaaS on Microsoft Azure
PPTX
Microsoft Ignite 2019 - API management for microservices in a hybrid and mult...
PPTX
Integrate UK 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft...
PDF
Techdays Finland 2019 - Adventures of building a (multi-tenant) PaaS on Micro...
PPTX
Azure Low Lands 2019 - Building secure cloud applications with Azure Key Vault
PPTX
Next Generation Data Integration with Azure Data Factory
PPTX
Intelligent Cloud Conference 2018 - Automatically scaling Kubernetes pods bas...
PPTX
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
PPTX
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
PPTX
Techdays Finland 2018 - Building secure cloud applications with Azure Key Vault
PPTX
TechDays NL 2016 - Building your scalable secure IoT Solution on Azure
Techorama 2022 - Adventures of building Promitor, an open-source product
Microsoft Partners - Application Autoscaling Made Easy With Kubernetes Event-...
Introduction to Promitor
Azure Lowlands 2020 - API management for microservices in a hybrid and multi-...
NDC London 2021 - Application Autoscaling Made Easy With Kubernetes Event-Dri...
Global Azure Virtual - Application Autoscaling with KEDA
Building Bruges 2020 - Adventures of building a multi-tenant PaaS on Microsof...
AZUG Lightning Talk - Application autoscaling on Kubernetes with Kubernetes E...
IglooConf 2020 - API management for microservices in a hybrid and multi-cloud...
IglooConf 2020 - Adventures of building a multi-tenant PaaS on Microsoft Azure
Microsoft Ignite 2019 - API management for microservices in a hybrid and mult...
Integrate UK 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft...
Techdays Finland 2019 - Adventures of building a (multi-tenant) PaaS on Micro...
Azure Low Lands 2019 - Building secure cloud applications with Azure Key Vault
Next Generation Data Integration with Azure Data Factory
Intelligent Cloud Conference 2018 - Automatically scaling Kubernetes pods bas...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Techdays Finland 2018 - Building secure cloud applications with Azure Key Vault
TechDays NL 2016 - Building your scalable secure IoT Solution on Azure

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Essential Infomation Tech presentation.pptx
PDF
System and Network Administration Chapter 2
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Introduction to Artificial Intelligence
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
AI in Product Development-omnex systems
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 2 - PM Management and IT Context
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Essential Infomation Tech presentation.pptx
System and Network Administration Chapter 2
How Creative Agencies Leverage Project Management Software.pdf
Introduction to Artificial Intelligence
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
AI in Product Development-omnex systems
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Reimagine Home Health with the Power of Agentic AI​
Design an Analysis of Algorithms I-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
Odoo Companies in India – Driving Business Transformation.pdf
Digital Strategies for Manufacturing Companies
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

NDC Minnesota - Analyzing StackExchange data with Azure Data Lake

  • 1. Analyzing StackExchange data with Azure Data Lake Tom Kerkhove Azure Consultant
  • 2. Tom Kerkhove Azure Consultant @ Codit Microsoft Azure MVP & Advisor “Integration of Things” whitepaper (https://bit.ly/azure-iot) Nice to meet you blog.tomkerkhove.be @TomKerkhove tomkerkhove
  • 3. Agenda • Introduction to Azure Data Lake • What is Azure Data Lake Store? • What is Azure Data Lake Analytics? 3
  • 4. 4
  • 6. Let’s go open-source, right?! ➔ Comes with a few challenges for C#/SQL professional ➔ New languages to learn & maintain ➔ Rapidly evolving ecosystem ➔ Cluster management ➔ Typically linux machines
  • 8. ➔ WebHDFS compatible ➔ Any size ➔ Any format as-is ➔ Write-once-read-many ➔ Enterprise-grade security ➔ Thé big data store in Azure Azure Data Lake Store
  • 10. Characteristics ➔ Data Warehousing ➔ Structured data ➔ Defined set of schemas ➔ Requires Extract-Transform- Load (ETL) before storing ➔ Known for some of us ➔ Exploratory analysis is hard because of transforming the data Data Warehousing vs Data Lakes ➔ Data Lakes ➔ Raw data (unstructured/semi-structured/structured) ➔ “Dump” all your data in the lake ➔ Data scientists will interpret data from the lake ➔ Without metadata, turns in a data swamp pretty fast
  • 11. Martin Fowler on Data Lake & Data Warehouses: https://bit.ly/martin-fowler-data-lake
  • 12. Security ➔ Roled-based Access Control (RBAC) ➔ Grant user/groups access to folder/file (https://bit.ly/data-lake-store-acls) ➔ Firewall (off by default) ➔ Encryption at rest ➔ Keys managed by Microsoft ➔ Bring-your-own-key with Azure Key Vault
  • 13. ➔ ~$0,032/GB stored per month ➔ Transaction costs ➔ ~$0,043 per 1M write transactions ➔ ~$0,0034 per 1M read transactions ➔ 1 transaction is block of up to 128 kB ➔ Regular Egress fees ➔ Monthly commitment packages ➔ Save up to 33% Pricing
  • 14. Azure Data Lake Store vs Blob Storage No Limitations Store whatever you want in any format Security Built-in Azure Active Directory support Pricing More expensive than Storage GRS Redundancy It’s there but no control over it Built for Scale Optimized for high- scale reads Integration With Data Factory, Data Catalog & HDInsight Full comparison on https://bit.ly/adls-vs-storage
  • 15. Demo – Data Lake Store 15
  • 16. Meet StackExchange ➔ Over 280 websites ➔ 150+ GB of open-source data ➔ Different kinds of data ➔ Posts ➔ Users ➔ Votes ➔ ... ➔ A big data sample data set
  • 17. What Are We Going To Do? • Download the original data set Acquiring The Data • Upload data set to Azure • Determine what service to use Moving The Data • Merging data from each site into one file • Conversion from XML to CSV Aggregating The Data • Run business logic on it • Attempt to gain knowledge from it Analyzing The Data • Visualize what we’ve learned Visualizing The Data
  • 18. How is it setup?
  • 19. Azure Data Lake Analytics ➔ Run analytics jobs on managed clusters ➔ No maintenance ~ Serverless ➔ Written in U-SQL ➔ SQL Syntax ➔ Extensibility in C# ➔ Easily scaled with Analytics Units ➔ Pay for processing time only
  • 20. ➔ Built-in partitioned tables ➔ Query data where it lives ➔ No need to prepare data ➔ One query that runs on multiple data stores ➔ Use the correct data store for the job Data Sources
  • 21. Writing U-SQL scripts Extract from data source by using built-in or custom extractors. Transform / Analyse the data using SQL-syntax, in-line C# or C# method calls Output the result to a data source by using built-in or custom extractors
  • 22. ➔ C# Expressions ➔ User-Defined Functions (UDF) ➔ User-Defined Operations (UDO) ➔ User-Defined Aggregators (UDAGG) Extensibility
  • 23. ➔ User-Defined Extractors ➔ User-Defined Processors ➔ Take one row and produce one row ➔ Pass-through versus transforming ➔ User-Defined Reducers ➔ Take n rows and produce 1 row ➔ User-Defined Outputters ➔ User-Defined Appliers ➔ Take one row and produce 0 to n rows ➔ Used with OUTER/CROSS APPLY ➔ User-Defined Combiners ➔ Combines rowsets (like a user- defined join) User-Defined Operations (UDO)
  • 25. U-SQL Batch Job Execution Lifetime Michael Rys on “Tuning & Optimizing U-SQL” https://bit.ly/tuning-optimizing-u-sql
  • 28. ➔ Roled-based Access Control (RBAC) ➔ Firewall (Off by default) ➔ Access control on service catalog ➔ Access control on a per-database level Security
  • 29. ➔ Account-level limitations ➔ Maximum of AUs ➔ Maximum of concurrent job ➔ Days to retain queries ➔ Job-level limitations ➔ Maximum of AUs ➔ Maximum priority ➔ Granted per user and/or group Resource Management
  • 30. Demo – Data Lake Analytics 30
  • 31. ➔ Store Explorer ➔ Browse store ➔ Download complete / subset of file ➔ Preview ➔ Only in Visual Studio ➔ Job Visualizer ➔ Determine bottlenecks by using heatmaps ➔ Playback jobs based on telemetry ➔ Query optimization ➔ Job Profiler Azure Data Lake tools for Visual Studio
  • 32. ➔ Integration with Source control ➔ Unit Testing extensibility ➔ Local execution ➔ Simulate Data Lake Store ➔ Run & debug jobs Azure Data Lake tools for Visual Studio (Code)
  • 33. ➔ Billed for processing time, not per job ➔ Billed per second ➔ $1,687 per hour per Analytics Unit ➔ ~ $0,028 per minute ➔ Monthly commitment packages ➔ Save up to 74% Pricing
  • 34. Operations Data Lake Store Data Lake Analytics Available Graphs • Storage Utilization • Read & Write • Ingress & Egress • Job status • Used # of AU time Available Metrics • Data Read & Write • Read & Write Requests • Total Storage • Job status • Used # of AU time Support for alerts Yes Built-in & custom Log Analytics queries (Requires Audit logs) Support for Audit Logs Yes Yes Support for Request Logs Yes Yes
  • 35. ➔ Integrate with your data pipelines in Azure Data Factory ➔ Move data from Azure Data Lake Store to other store ➔ Move data to Azure Data Lake Store ➔ Run U-SQL jobs within pipeline ➔ Integration with Azure Data Catalog ➔ Register your Azure Data Lake Store assets Integration with Azure Services
  • 36. ➔ Azure Data Architecture Guide (https://docs.microsoft.com/en-us/azure/architecture/data-guide/) ➔ “Mastering Azure Analytics” by Zoiner Tejada (https://bit.ly/mastering-azure-analytics) ➔ MVA “Introducing Azure Data Lake” (https://bit.ly/intro-to-azure-data-lake) ➔ Azure Data Lake GitHub Repo (https://azure.github.io/AzureDataLake/) ➔ U-SQL Documentation (https://usql.io) Learn more!
  • 37. ➔ Big Data is not just a hype so get ready ➔ Azure Data Lake Store ➔ Analyse today & explore tomorrow ➔ Beware of the data swamps ➔ Data Lake Analytics ➔ Serverless ➔ Re-use existing skills ➔ Pay for what we use ➔ Big Data in Azure? Use Azure Data Lake! Summary
  • 38. 38

Editor's Notes

  • #8: HDI – Managed cluster service, Open-source technology, Runs on Windows or Linux Store – Unlimited Storage, WebHDFS Analytics - Managed job service, U-SQL batch-processing Based on MSFT Cosmos Cortana, Bing, Xbox Live, etc.
  • #12: Analogy with fishing – Go fishing in lake, but it in your warehouse. Lake becomes swamp, fish dies
  • #15: No Limitations – Store is unlimited, storage is limited to 100 accounts in a subscription, 500 TB each Security –AAD vs SAS or Name/Key auth Pricing – ADLS is more expensive Redundancy – No control over redundancy Built for Scale – Optimized for high reads and analytics, scales with the reads, high volume of small writes  Real-time analytics