SlideShare a Scribd company logo
An Introduction To
SQL Server 2019
Big Data Clusters
#DataRelay@DataRelay_UK DataRelay.co.uk
Thank you to our sponsors. We couldn’t do it without
you!
PLATINUM
BRONZE
GOLD
About Me
 EMEA SQL Server Solutions architect for Pure Storage
 A user of SQL Server since 2000
 14+ years of SQL Server experience
 Speaker on the SQL Server community circuit
Does your organisation ? . . .
. . .aggregate data from
various data sources and
finds itself drowning in SSIS
packages ?
. . . use data science tools and
T-SQL, and it needs to keep
sensitive data on the same
platform for use by users of both
technologies ?
. . . need to store large
amounts of unstructured data
and query this using
traditional SQL Server tools ?
. . . need a true scale out
data platform?
Disclaimer
 The big data cluster content in this deck
is correct as of release candidate 1 (RC1)
 Things may (and will) change
between RC1 and the RTM version of SQL Server 2019
 Hybrid SQL Server / Spark scale-out
data platform
 Features next generation of PolyBase
 Runs on Kubernetes
What Are Big Data Clusters ?
 Available in public preview form
 Kerberos integration only available in
private preview form
 GA second half on 2019 sometime
Microsoft Ignite Time ?
The Story So Far
 Containers at the bottom
 Kubernetes in the middle
 SQL Server 2019 big data clusters
at the top
The Different Layers Of The ‘Cake’
Containers – A Good Analogy
 Scale-out
 Container scheduling
 Application resilience
 Service discovery
 Storage orchestration
 Etc . . . Etc . . .
But All Is Not Well – What About ? . . .
Kubernetes To The Rescue
Youtube link
A Great Introduction to Kubernetes 101
Kubernetes TL;DR
Node Ports Services
Node Node
Pod Pod
31387
Node Node
Pod Pod
Connecting To Applications
31387 31387
 ‘Master’ node(s)
 {API} server
 Control
 Scheduling
 Cluster state (etcd / cosmosdb)
The Kubernetes Cluster ‘Brain’
 ‘Worker’ node(s)
 Kubelets
 Pods
 Volumes
 Container run time
The Kubernetes Cluster ‘Body’
 Python
 azdata (formally mssqlctl)
 kubectl
 vscode with Kubernetes extension
 Azure data studio
 SQL Server 2019 extension
 SQL Server Management Studio version 18+
Tools
Big Data Cluster Architecture
ODBC NoSQL Relational databases Big Data
SQL Server
Analytics Apps
PolyBase external tables
T-SQL
Big Data Cluster Architecture
azdata bdc hdfs mount create --remote-uri=s3a:<remote-uri> --mount-path=<mount-path>
HDFS Tiering
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
Appproxy-svc-external LoadBalancer 10.233.19.57 192.168.101.50 8080:30778/TCP 37h
Controller-svc-external LoadBalancer 10.233.63.91 192.168.101.51 80:30080/TCP 32h
Gateway-svc-external LoadBalancer 10.233.38.193 192.168.101.52 8443:30443/TCP 32h
Master-svc-external LoadBalancer 10.233.19.51 192.168.101.53 1433:31433/TCP 32h
Mgmtproxy-svc-external LoadBalancer 10.233.12.208 192.168.101.54 8080:30777/TCP 32h
azdata
Connecting To The Cluster
 Storage pool: curl, Azure Data Factory, kubectl cp or HDFS tiering
 Data pool: any TDS tool / client or external tables
Data Ingestion
 Create a Kubernetes cluster
 Install Python
 Install azdata
 Setup environment vars or
JSON
 Create cluster with azdata
Creating A Big Data Cluster: One Way
 Install Python
 Install azdata
 az login (get account id)
 Download and run
deploy-sql-big-data-aks.py
. . . The Fastest Way
 Worker node hosts running out of
space for docker images
 docker rmi is your friend !!!
 Using old versions of mssqlctl
 mssqlctl –version to the rescue
Main Gotches I Ran Into
 Kubernetes
+ storage
= source of great confusion
 Ephemeral storage
= great for kicking the tyres
 What about production grade
installations ?
IBM RAMAC
worlds first commercial hard drive
A Word On Storage
 Data protection
 Security
 Elasticity
 Licensing
Missing Pieces Of The Puzzle
From upgrade to a new release
Data Protection
Security
Other Cool Stuff
 Deploy and consume apps:
R, Python, Mleap and SSIS,
 Data wrangling with PROSE code
accelerator,
 Spark machine learning with Mleap,
 Sparklyr,
 Bulk processing in Spark.
 New York Taxi dataset (S3)
imported via HDFS tiering
 Data analysed with a python
notebook in Azure Data Studio
 External table created for
analysis of the data using
Transact-SQL
Recorded Demo
https://www.youtube.com/watch?v=JXKvCLhKSw8
https://github.com/microsoft/sqlworkshops/tree/master/sqlserver2019bigdataclusters
Take A Picture Of Me
 Official documentation
 Microsoft workshops
 Anything from Kelsey Hightower on youtube.com
Where To Next ?
Questions ?
Contact Details
•
•
•
•
ChrisAdkin8
cadkin@purestorage.com
http://uk.linkedin.com/in/wollatondba
#DataRelay@DataRelay_UK DataRelay.co.uk
https://datarelay.co.uk/Feedback
Feedback
PLATINUM
BRONZE
GOLD

More Related Content

PPTX
Data weekender deploying prod grade sql 2019 big data clusters
PDF
Bdc from bare metal to k8s
PDF
A guide of PostgreSQL on Kubernetes
PDF
Deploying PostgreSQL on Kubernetes
PDF
Critical Attributes for a High-Performance, Low-Latency Database
PDF
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
PDF
Serverless Data Platform
PDF
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO
Data weekender deploying prod grade sql 2019 big data clusters
Bdc from bare metal to k8s
A guide of PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
Critical Attributes for a High-Performance, Low-Latency Database
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Serverless Data Platform
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO

What's hot (20)

PDF
Pachyderm: Building a Big Data Beast On Kubernetes
PDF
Steering the Sea Monster - Integrating Scylla with Kubernetes
PDF
Introducing Scylla Manager: Cluster Management and Task Automation
PDF
Apache Spark on K8s and HDFS Security
PDF
Spark day 2017 - Spark on Kubernetes
PDF
Running Cassandra in AWS
PDF
Scylla: 1 Million CQL operations per second per server
PDF
Beyond Ingresses - Better Traffic Management in Kubernetes
PDF
Taking Your Database Global with Kubernetes
PPTX
Terraform Modules Restructured
PDF
The Do’s and Don’ts of Benchmarking Databases
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PPTX
Containerizing GPU Applications with Docker for Scaling to the Cloud
PDF
Data Processing solution for Health Domain.
PDF
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
PPTX
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
PDF
What we Learned About Application Resiliency When the Data Center Burned Down
Pachyderm: Building a Big Data Beast On Kubernetes
Steering the Sea Monster - Integrating Scylla with Kubernetes
Introducing Scylla Manager: Cluster Management and Task Automation
Apache Spark on K8s and HDFS Security
Spark day 2017 - Spark on Kubernetes
Running Cassandra in AWS
Scylla: 1 Million CQL operations per second per server
Beyond Ingresses - Better Traffic Management in Kubernetes
Taking Your Database Global with Kubernetes
Terraform Modules Restructured
The Do’s and Don’ts of Benchmarking Databases
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
ScyllaDB: NoSQL at Ludicrous Speed
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Containerizing GPU Applications with Docker for Scaling to the Cloud
Data Processing solution for Health Domain.
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
What we Learned About Application Resiliency When the Data Center Burned Down
Ad

Similar to Data relay introduction to big data clusters (20)

PPTX
Qlik_Data_Integration_Platform_Sales_Deck_3.pptx
PPTX
Overview SQL Server 2019
PDF
Azure + DataStax Enterprise Powers Office 365 Per User Store
PPTX
Discovery Day 2019 Sofia - What is new in SQL Server 2019
PPTX
The Roadmap for SQL Server 2019
PDF
RDBMS to NoSQL: Practical Advice from Successful Migrations
PDF
PartnerSkillUp_Enable a Streaming CDC Solution
PDF
SQL Server 2019 Big Data Cluster
PPTX
DevOps with Kubernetes and Helm - OSCON 2018
PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
PDF
Multi-cluster k8ssandra
PDF
Docker Containers- Data Engineers' Arsenal.pdf
PPTX
Stargate, the gateway for some multi-models data API
PDF
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
PPTX
Azure Data.pptx
PDF
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
PDF
Big Data Analytics from Azure Cloud to Power BI Mobile
PPTX
Sky High With Azure
PDF
PPTX
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Qlik_Data_Integration_Platform_Sales_Deck_3.pptx
Overview SQL Server 2019
Azure + DataStax Enterprise Powers Office 365 Per User Store
Discovery Day 2019 Sofia - What is new in SQL Server 2019
The Roadmap for SQL Server 2019
RDBMS to NoSQL: Practical Advice from Successful Migrations
PartnerSkillUp_Enable a Streaming CDC Solution
SQL Server 2019 Big Data Cluster
DevOps with Kubernetes and Helm - OSCON 2018
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
Multi-cluster k8ssandra
Docker Containers- Data Engineers' Arsenal.pdf
Stargate, the gateway for some multi-models data API
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Azure Data.pptx
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Big Data Analytics from Azure Cloud to Power BI Mobile
Sky High With Azure
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Ad

More from Chris Adkin (16)

PPTX
Ci with jenkins docker and mssql belgium
PPTX
Continuous Integration With Jenkins Docker SQL Server
PPTX
Sql server scalability fundamentals
PPTX
Leveraging memory in sql server
PPTX
Super scaling singleton inserts
PPTX
Scaling sql server 2014 parallel insert
PPTX
Sql server engine cpu cache as the new ram
PPTX
Sql sever engine batch mode and cpu architectures
PPTX
An introduction to column store indexes and batch mode
PPTX
Column store indexes and batch processing mode (nx power lite)
PPTX
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
PPTX
Building scalable application with sql server
PDF
TSQL Coding Guidelines
PPT
J2EE Performance And Scalability Bp
PPT
J2EE Batch Processing
PPT
Oracle Sql Tuning
Ci with jenkins docker and mssql belgium
Continuous Integration With Jenkins Docker SQL Server
Sql server scalability fundamentals
Leveraging memory in sql server
Super scaling singleton inserts
Scaling sql server 2014 parallel insert
Sql server engine cpu cache as the new ram
Sql sever engine batch mode and cpu architectures
An introduction to column store indexes and batch mode
Column store indexes and batch processing mode (nx power lite)
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Building scalable application with sql server
TSQL Coding Guidelines
J2EE Performance And Scalability Bp
J2EE Batch Processing
Oracle Sql Tuning

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Transcultural that can help you someday.
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
[EN] Industrial Machine Downtime Prediction
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Leprosy and NLEP programme community medicine
Qualitative Qantitative and Mixed Methods.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
STERILIZATION AND DISINFECTION-1.ppthhhbx
climate analysis of Dhaka ,Banglades.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DATA COLLECTION METHODS-ppt for nursing research
SAP 2 completion done . PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Optimise Shopper Experiences with a Strong Data Estate.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Transcultural that can help you someday.
importance of Data-Visualization-in-Data-Science. for mba studnts
[EN] Industrial Machine Downtime Prediction
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Leprosy and NLEP programme community medicine

Data relay introduction to big data clusters