SlideShare a Scribd company logo
RUNNING INTELLIGENT
APPLICATIONS INSIDE A
DATABASE: DEEP LEARNING
WITH PYTHON STORED
PROCEDURES IN SQL
@ODSC
Dr. Miguel Fierro
@miguelgfierro
https://miguelgfierro.com
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
source: http://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf
$15.7Trillion by 2030 ~ 14% GPD
Productivity gains ($6.6T)
Automation
Increased demand ($9.1T)
Augmentation Higher quality products
AI is the Biggest Business Opportunity
More and more data
source: https://xkcd.com/1838/
90% of the data created in the
last 2 years
Estimations are 40x by 2020
+info: https://miguelgfierro.com/blog/2017/deep-learning-for-
entrepreneurs/
Traditional Python vs SQL Python
Don’t move huge amounts of data
Don’t move critical data
Traditional Python vs SQL Python
Azure Relational Database Platform
Azure Cloud in 38 regions
AzureAnalytics,ML,CognitiveServices,
Bots,PowerBI
Azure Compute & Storage
Database Service Platform
Secure: High Availability, Audit, Backup/Restore
Flexible: On-demand scaling, Resource governance
Intelligence: Advisor, Tuning, Monitoring
SQL Server, MySQL & PostgreSQL
SQL Server 2017 Features
+info: https://www.microsoft.com/en-us/sql-server/sql-server-2017-editions
Management
Platforms Windows, Linux & Docker
Max size 534Pb
Stretch database
Manage hybrid scenarios with on-premise and
cloud data
Programmability JSON & Graph support
Security
Dynamic Data Masking Protects sensitive data
Row-level security Access control of rows based on user priviledges
Performance
In-memory performance Memory optimized tables
Adaptive query processing Performance improvement of batch queries
Analytics
Advance Analytics Python & R integration
Parallel Advanced Analytics Python & R integration with GPU processes
SQL Server 2017 Platforms: Linux
+info: https://blogs.technet.microsoft.com/dataplatforminsider/2016/12/16/sql-server-on-linux-how-introduction/
SQLPAL (SQL Platform Abstraction Layer) allows some
Windows libraries to run on Linux
SQLPAL interacts with the Linux host through Application
Binary Interface calles (ABI)
The performance in Windows and Linux is basically the same
SQL Server 2017 Programmability
Temporal tables
JSON support
Graph data support
Polybase to interact with
Hadoop
Python SQL for Model Development
Python SQL for Model Operationalization
Database Stored Procedures
Functions stored inside the database
Have input and output parameters
Are stored in the database data
dictionary
Example:
CREATE PROCEDURE
<procedure name>
AS BEGIN
<SQL statement>
END GO
System Stored Procedures
+info: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/system-stored-
procedures-transact-sql
Geo-replication SP
Maintenance Plan SP
Policy Management SP
Replication SP
Distributed Query Management SP
Database Engine SP
Execute External Script Stored Procedure
EXECUTE sp_execute_external_script
@language = N’language’
, @script = N‘ <code here> ’
, @input_data_1 = N' SELECT *’
WITH RESULT SETS ((<var_name> char(20) NOT NULL));
EXECUTE sp_execute_external_script
@language = N’R’
, @script = N‘
mytextvariable <- c("hello", " ", "world");
OutputDataSet <- as.data.frame(mytextvariable);’
, @input_data_1 = N‘SELECT 1 as Temp1’
WITH RESULT SETS (([Col1] char(20) NOT NULL));
revoscalepy and RevoScaleR
+info revoscalepy: https://docs.microsoft.com/en-us/machine-learning-server/python-reference/revoscalepy/revoscalepy-package
+info RevoScaleR: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler
RxLocalSeq RxInSqlServer RxSpark
3 compute contexts for Python and R
revoscalepy functions
Category Description
Compute context Getters and Setters of compute context
Data source Data source object for ODBC, XDF, SQL table, SQL query
ETL Data input/output and transformation
Analytics
Linear regression, logistic regression, random forest, boosted
decision trees
Jobs Manage and schedule jobs, monitoring
Serialization Serialization of models and data objects
Utility Manage utilities and status functions
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
Ski rental prediction with revoscalepy
source: https://microsoft.github.io/sql-ml-tutorials/python/rentalprediction/
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
SQL
USE master;
GO
RESTORE DATABASE TutorialDB
FROM DISK = 'C:MSSQLBackupTutorialDB.bak'
WITH
MOVE 'TutorialDB' TO 'C:MSSQLDATATutorialDB.mdf'
,MOVE 'TutorialDB_log' TO 'C:MSSQLDATATutorialDB.ldf';
GO
SQL
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from revoscalepy import RxComputeContext, RxInSqlServer, RxSqlServerData
from revoscalepy import rx_import
#Connection string to connect to SQL Server named instance
conn_str = 'Driver=SQL Server;Server=MYSQLSERVER;
Database=TutorialDB;
Trusted_Connection=True;’
data_source = RxSqlServerData(table="dbo.rental_data",
connection_string=conn_str,
column_info=column_info)
computeContext = RxInSqlServer(
connection_string = conn_str,
num_tasks = 1,
auto_cleanup = False
)
RxInSqlServer(connection_string=conn_str, num_tasks=1,
auto_cleanup=False)
Python
Ski rental prediction with revoscalepy
# import data source and convert to pandas dataframe
df = pd.DataFrame(rx_import(input_data = data_source))
print("Data frame:", df)
Python
Rows Processed: 453
Data frame: Day Holiday Month RentalCount Snow WeekDay Year
0 20 1 1 445 2 2 2014
1 13 2 2 40 2 5 2014
2 10 2 3 456 2 1 2013
3 31 2 3 38 2 2 2014
4 24 2 4 23 2 5 2014
5 11 2 2 42 2 4 2015
6 28 2 4 310 2 1 2013
...
[453 rows x 7 columns]
Results
Ski rental prediction with revoscalepy
# Store the variable we'll be predicting on.
target = "RentalCount“
# Generate the training set. Set random_state to be able to replicate
results.
train = df.sample(frac=0.8, random_state=1)
# Select anything not in the training set and put it in the testing set.
test = df.loc[~df.index.isin(train.index)]
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(train[columns], train[target])
Python
Ski rental prediction with revoscalepy
# Generate our predictions for the test set.
lin_predictions = lin_model.predict(test[columns])
print("Predictions:", lin_predictions)
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, test[target])
print("Computed error:", lin_mse)
Python
Predictions: [ 40. 38. 240. 39. 514. 48. 297. 25. 507. 24.
30. 54. 40. 26. 30. 34. 42. 390. 336. 37. 22. 35.
55. 350. 252. 370. 499. 48. 37. 494. 46. 25. 312. 390.
35. 35. 421. 39. 176. 21. 33. 452. 34. 28. 37. 260.
49. 577. 312. 24. 24. 390. 34. 64. 26. 32. 33. 358.
348. 25. 35. 48. 39. 44. 58. 24. 350. 651. 38. 468.
26. 42. 310. 709. 155. 26. 648. 617. 26. 846. 729. 44.
432. 25. 39. 28. 325. 46. 36. 50. 63.]
Computed error: 3.59831533436e-26
Results
Ski rental prediction with revoscalepy
Ski rental prediction with SQL store procedures
USE TutorialDB;
DROP TABLE IF EXISTS rental_py_models;
GO
CREATE TABLE rental_py_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model’) PRIMARY KEY,
model VARBINARY(MAX) NOT NULL);
GO
SQL
DROP TABLE IF EXISTS py_rental_predictions;
GO
CREATE TABLE py_rental_predictions(
[RentalCount_Predicted] [int] NULL,
[RentalCount_Actual] [int] NULL,
[Month] [int] NULL,
[Day] [int] NULL,
[WeekDay] [int] NULL,
[Snow] [int] NULL,
[Holiday] [int] NULL,
[Year] [int] NULL);
GO
SQL
-- Train model
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max)
OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
from sklearn.linear_model import LinearRegression
import pickle
df = rental_train_data
lin_model = LinearRegression()
lin_model.fit(df[columns], df[target])
trained_model = pickle.dumps(lin_model)’
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day",
"WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;
GO
SQL
Ski rental prediction with SQL store procedures
--Execute model training
DECLARE @model VARBINARY(MAX);
EXEC generate_rental_py_model @model OUTPUT;
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model',
@model);
SQL
Ski rental prediction with SQL store procedures
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
GO
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
AS
BEGIN
DECLARE @py_model varbinary(max) = (select model from
rental_py_models where model_name = @model);
EXEC sp_execute_external_script
@language = N‘Python’,
@script = N‘
rental_model = pickle.loads(py_model)
df = rental_score_data
# [… python code here …]
lin_predictions = rental_model.predict(df[columns])
predictions_df = pd.DataFrame(lin_predictions)
OutputDataSet = pd.concat([predictions_df, df["RentalCount"],
df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"],
df["Year"]], axis=1)
’
-- [… continues in next slide…]
SQL
Ski rental prediction with SQL store procedures
--[… from previous slide…]
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day",
"WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
WITH RESULT SETS (("RentalCount_Predicted" float, "RentalCount" float,
"Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float,
"Year" float));
END;
GO
SQL
-- Execute the prediction
EXEC py_predict_rentalcount 'linear_model';
SELECT * FROM py_rental_predictions;
SQL
Ski rental prediction with SQL store procedures
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
Convolutional
Neural Networks
(CNN)
Recurrent
Neural Networks
(RNN)
Two General Kinds of Neural Networks
low level features high level featuresmedium level features
Interesting paper about representations: https://arxiv.org/abs/1411.1792
Multiple Levels of Representation
$1 million in prizes !
Determine
whether a
patient has
cancer or not
competition
Lung Cancer Competition
Data: CT scans of the lung
1595 patients with a diagnostic
200-500 scans per patient
Images of 512x512px
ImageNet dataset Lung cancer dataset
weight transference
Transfer Learning
Forward and backward propagation
input hidden hidden hidden hidden hidden output
Standard Training
Transference option 1: freeze n-1 layers
Frozen layers
input hidden hidden hidden hidden hidden output
Transference option 2: freeze initial layers
Frozen layers Forward and backward propagation
input hidden hidden hidden hidden hidden output
Transference option 3: fine tuning
Forward and backward propagation
input hidden hidden hidden hidden hidden output
3
224
224
last
layer
ImageNet ResNet N layers
penultimate
layer
cat
Pretrained ResNet 152
source: https://github.com/Azure/sql_python_deep_learning
Solution: CNN Featurizer
source: https://github.com/Azure/sql_python_deep_learning
3
224
224
ResNet N-1 layers
penultimate
layer
CNTK
(53min)k batch
of images
= 1 patient
3
224
224
ResNet N-1 layers
penultimate
layer
no
cancer
CNTK
(53min)
LightGBM
(2min)
Boosted tree
k batch
of images
= 1 patient
features
Solution: Boosted Tree Classifier
source: https://github.com/Azure/sql_python_deep_learning
(Extra slide): 2nd place in the competition
source: https://github.com/juliandewit/kaggle_ndsb2017
Deep Learning in SQL Server: Training
sp.dbo.GenerateFeatures
CNTK with GPUs
sp.dbo.TrainLungCancerModel
LightGBM
Populate tables
Deep Learning in SQL Server: Operationalization
sp.dbo.PredictLungCancer Web App
Demo
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Web App
Web App
Web App
THANK YOU
@ODSC
Dr. Miguel Fierro
@miguelgfierro
https://miguelgfierro.com
Ad

Recommended

PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Predictive Analytics with Airflow and PySpark
Russell Jurney
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PPT
Agile Data Science: Hadoop Analytics Applications
Russell Jurney
 
PDF
High-Performance Advanced Analytics with Spark-Alchemy
Databricks
 
PDF
Goal Based Data Production with Sim Simeonov
Databricks
 
PDF
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
Herman Wu
 
PDF
New developments in open source ecosystem spark3.0 koalas delta lake
Xiao Li
 
PDF
Telemetry doesn't have to be scary; Ben Ford
Puppet
 
PDF
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
Amazon Web Services Korea
 
PDF
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
Amazon Web Services Korea
 
PDF
DataFu @ ApacheCon 2014
William Vaughan
 
PPTX
Building Data Products at LinkedIn with DataFu
Matthew Hayes
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
PDF
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
PPTX
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
PDF
PostgreSQL Open SV 2018
artgillespie
 
PDF
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
PPTX
AdClickFraud_Bigdata-Apic-Ist-2019
Neha gupta
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PDF
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
 
PPTX
Genomic Scale Big Data Pipelines
Lynn Langit
 
PDF
Clustering your Application with Hazelcast
Hazelcast
 
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
PDF
H20: A platform for big math
DataWorks Summit/Hadoop Summit
 
PDF
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
Sarah Dutkiewicz
 
PDF
Python Advanced Predictive Analytics Kumar Ashish
dakorarampse
 

More Related Content

What's hot (20)

PDF
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
Herman Wu
 
PDF
New developments in open source ecosystem spark3.0 koalas delta lake
Xiao Li
 
PDF
Telemetry doesn't have to be scary; Ben Ford
Puppet
 
PDF
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
Amazon Web Services Korea
 
PDF
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
Amazon Web Services Korea
 
PDF
DataFu @ ApacheCon 2014
William Vaughan
 
PPTX
Building Data Products at LinkedIn with DataFu
Matthew Hayes
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
PDF
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
PPTX
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
PDF
PostgreSQL Open SV 2018
artgillespie
 
PDF
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
PPTX
AdClickFraud_Bigdata-Apic-Ist-2019
Neha gupta
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PDF
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
 
PPTX
Genomic Scale Big Data Pipelines
Lynn Langit
 
PDF
Clustering your Application with Hazelcast
Hazelcast
 
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
PDF
H20: A platform for big math
DataWorks Summit/Hadoop Summit
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
Herman Wu
 
New developments in open source ecosystem spark3.0 koalas delta lake
Xiao Li
 
Telemetry doesn't have to be scary; Ben Ford
Puppet
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
Amazon Web Services Korea
 
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
Amazon Web Services Korea
 
DataFu @ ApacheCon 2014
William Vaughan
 
Building Data Products at LinkedIn with DataFu
Matthew Hayes
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
PostgreSQL Open SV 2018
artgillespie
 
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
AdClickFraud_Bigdata-Apic-Ist-2019
Neha gupta
 
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
 
Genomic Scale Big Data Pipelines
Lynn Langit
 
Clustering your Application with Hazelcast
Hazelcast
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
H20: A platform for big math
DataWorks Summit/Hadoop Summit
 

Similar to Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL (20)

PDF
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
Sarah Dutkiewicz
 
PDF
Python Advanced Predictive Analytics Kumar Ashish
dakorarampse
 
PDF
Predictive Analysis using Microsoft SQL Server R Services
Fisnik Doko
 
PDF
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Luis Beltran
 
PDF
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET Journal
 
PDF
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
IRJET Journal
 
PDF
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp
 
PPTX
House price prediction
SabahBegum
 
PDF
Scaling Analytics with Apache Spark
QuantUniversity
 
PDF
Real Estate Investment Advising Using Machine Learning
IRJET Journal
 
PDF
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
PDF
pyspark.pdf
snowflakebatch
 
PDF
Data Analysis with TensorFlow in PostgreSQL
EDB
 
PDF
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
 
PDF
Learn Python teaching deck, learn how to code
synix4
 
PDF
Spark ml streaming
Adam Doyle
 
PDF
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PyData
 
PPTX
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
 
PPTX
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
 
PPTX
DataMass Summit - Machine Learning for Big Data in SQL Server
Łukasz Grala
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
Sarah Dutkiewicz
 
Python Advanced Predictive Analytics Kumar Ashish
dakorarampse
 
Predictive Analysis using Microsoft SQL Server R Services
Fisnik Doko
 
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Luis Beltran
 
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET Journal
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
IRJET Journal
 
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp
 
House price prediction
SabahBegum
 
Scaling Analytics with Apache Spark
QuantUniversity
 
Real Estate Investment Advising Using Machine Learning
IRJET Journal
 
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
pyspark.pdf
snowflakebatch
 
Data Analysis with TensorFlow in PostgreSQL
EDB
 
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
 
Learn Python teaching deck, learn how to code
synix4
 
Spark ml streaming
Adam Doyle
 
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PyData
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
 
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
 
DataMass Summit - Machine Learning for Big Data in SQL Server
Łukasz Grala
 
Ad

More from Miguel González-Fierro (12)

PPTX
Los retos de la inteligencia artificial en la sociedad actual
Miguel González-Fierro
 
PDF
Knowledge Graph Recommendation Systems For COVID-19
Miguel González-Fierro
 
PDF
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Miguel González-Fierro
 
PPTX
Best practices in coding for beginners
Miguel González-Fierro
 
PDF
Distributed training of Deep Learning Models
Miguel González-Fierro
 
PPTX
Deep Learning for Sales Professionals
Miguel González-Fierro
 
PPTX
Deep Learning for Lung Cancer Detection
Miguel González-Fierro
 
PPTX
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Miguel González-Fierro
 
PPTX
Speeding up machine-learning applications with the LightGBM library
Miguel González-Fierro
 
PDF
Leveraging Data Driven Research Through Microsoft Azure
Miguel González-Fierro
 
PDF
Empowering every person on the planet to achieve more
Miguel González-Fierro
 
PDF
Deep Learning for NLP
Miguel González-Fierro
 
Los retos de la inteligencia artificial en la sociedad actual
Miguel González-Fierro
 
Knowledge Graph Recommendation Systems For COVID-19
Miguel González-Fierro
 
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Miguel González-Fierro
 
Best practices in coding for beginners
Miguel González-Fierro
 
Distributed training of Deep Learning Models
Miguel González-Fierro
 
Deep Learning for Sales Professionals
Miguel González-Fierro
 
Deep Learning for Lung Cancer Detection
Miguel González-Fierro
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Miguel González-Fierro
 
Speeding up machine-learning applications with the LightGBM library
Miguel González-Fierro
 
Leveraging Data Driven Research Through Microsoft Azure
Miguel González-Fierro
 
Empowering every person on the planet to achieve more
Miguel González-Fierro
 
Deep Learning for NLP
Miguel González-Fierro
 
Ad

Recently uploaded (20)

PDF
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
PDF
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
PPTX
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
PPTX
Structural Wonderers_new and ancient.pptx
nikopapa113
 
PDF
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
PPTX
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
PPTX
Introduction to Python Programming Language
merlinjohnsy
 
PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
PDF
International Journal of Advanced Information Technology (IJAIT)
ijait
 
PPTX
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
PPTX
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
PDF
System design handwritten notes guidance
Shabista Imam
 
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
PDF
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
PPTX
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
PPTX
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
PPTX
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
 
PPTX
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
Structural Wonderers_new and ancient.pptx
nikopapa113
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
Introduction to Python Programming Language
merlinjohnsy
 
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
International Journal of Advanced Information Technology (IJAIT)
ijait
 
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
System design handwritten notes guidance
Shabista Imam
 
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
 
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 

Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • 1. RUNNING INTELLIGENT APPLICATIONS INSIDE A DATABASE: DEEP LEARNING WITH PYTHON STORED PROCEDURES IN SQL @ODSC Dr. Miguel Fierro @miguelgfierro https://miguelgfierro.com
  • 2. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 3. source: http://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf $15.7Trillion by 2030 ~ 14% GPD Productivity gains ($6.6T) Automation Increased demand ($9.1T) Augmentation Higher quality products AI is the Biggest Business Opportunity
  • 4. More and more data source: https://xkcd.com/1838/ 90% of the data created in the last 2 years Estimations are 40x by 2020 +info: https://miguelgfierro.com/blog/2017/deep-learning-for- entrepreneurs/
  • 6. Don’t move huge amounts of data Don’t move critical data Traditional Python vs SQL Python
  • 7. Azure Relational Database Platform Azure Cloud in 38 regions AzureAnalytics,ML,CognitiveServices, Bots,PowerBI Azure Compute & Storage Database Service Platform Secure: High Availability, Audit, Backup/Restore Flexible: On-demand scaling, Resource governance Intelligence: Advisor, Tuning, Monitoring SQL Server, MySQL & PostgreSQL
  • 8. SQL Server 2017 Features +info: https://www.microsoft.com/en-us/sql-server/sql-server-2017-editions Management Platforms Windows, Linux & Docker Max size 534Pb Stretch database Manage hybrid scenarios with on-premise and cloud data Programmability JSON & Graph support Security Dynamic Data Masking Protects sensitive data Row-level security Access control of rows based on user priviledges Performance In-memory performance Memory optimized tables Adaptive query processing Performance improvement of batch queries Analytics Advance Analytics Python & R integration Parallel Advanced Analytics Python & R integration with GPU processes
  • 9. SQL Server 2017 Platforms: Linux +info: https://blogs.technet.microsoft.com/dataplatforminsider/2016/12/16/sql-server-on-linux-how-introduction/ SQLPAL (SQL Platform Abstraction Layer) allows some Windows libraries to run on Linux SQLPAL interacts with the Linux host through Application Binary Interface calles (ABI) The performance in Windows and Linux is basically the same
  • 10. SQL Server 2017 Programmability Temporal tables JSON support Graph data support Polybase to interact with Hadoop
  • 11. Python SQL for Model Development
  • 12. Python SQL for Model Operationalization
  • 13. Database Stored Procedures Functions stored inside the database Have input and output parameters Are stored in the database data dictionary Example: CREATE PROCEDURE <procedure name> AS BEGIN <SQL statement> END GO
  • 14. System Stored Procedures +info: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/system-stored- procedures-transact-sql Geo-replication SP Maintenance Plan SP Policy Management SP Replication SP Distributed Query Management SP Database Engine SP
  • 15. Execute External Script Stored Procedure EXECUTE sp_execute_external_script @language = N’language’ , @script = N‘ <code here> ’ , @input_data_1 = N' SELECT *’ WITH RESULT SETS ((<var_name> char(20) NOT NULL)); EXECUTE sp_execute_external_script @language = N’R’ , @script = N‘ mytextvariable <- c("hello", " ", "world"); OutputDataSet <- as.data.frame(mytextvariable);’ , @input_data_1 = N‘SELECT 1 as Temp1’ WITH RESULT SETS (([Col1] char(20) NOT NULL));
  • 16. revoscalepy and RevoScaleR +info revoscalepy: https://docs.microsoft.com/en-us/machine-learning-server/python-reference/revoscalepy/revoscalepy-package +info RevoScaleR: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler RxLocalSeq RxInSqlServer RxSpark 3 compute contexts for Python and R
  • 17. revoscalepy functions Category Description Compute context Getters and Setters of compute context Data source Data source object for ODBC, XDF, SQL table, SQL query ETL Data input/output and transformation Analytics Linear regression, logistic regression, random forest, boosted decision trees Jobs Manage and schedule jobs, monitoring Serialization Serialization of models and data objects Utility Manage utilities and status functions
  • 18. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 19. Ski rental prediction with revoscalepy source: https://microsoft.github.io/sql-ml-tutorials/python/rentalprediction/ EXEC sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE SQL USE master; GO RESTORE DATABASE TutorialDB FROM DISK = 'C:MSSQLBackupTutorialDB.bak' WITH MOVE 'TutorialDB' TO 'C:MSSQLDATATutorialDB.mdf' ,MOVE 'TutorialDB_log' TO 'C:MSSQLDATATutorialDB.ldf'; GO SQL
  • 20. import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error from revoscalepy import RxComputeContext, RxInSqlServer, RxSqlServerData from revoscalepy import rx_import #Connection string to connect to SQL Server named instance conn_str = 'Driver=SQL Server;Server=MYSQLSERVER; Database=TutorialDB; Trusted_Connection=True;’ data_source = RxSqlServerData(table="dbo.rental_data", connection_string=conn_str, column_info=column_info) computeContext = RxInSqlServer( connection_string = conn_str, num_tasks = 1, auto_cleanup = False ) RxInSqlServer(connection_string=conn_str, num_tasks=1, auto_cleanup=False) Python Ski rental prediction with revoscalepy
  • 21. # import data source and convert to pandas dataframe df = pd.DataFrame(rx_import(input_data = data_source)) print("Data frame:", df) Python Rows Processed: 453 Data frame: Day Holiday Month RentalCount Snow WeekDay Year 0 20 1 1 445 2 2 2014 1 13 2 2 40 2 5 2014 2 10 2 3 456 2 1 2013 3 31 2 3 38 2 2 2014 4 24 2 4 23 2 5 2014 5 11 2 2 42 2 4 2015 6 28 2 4 310 2 1 2013 ... [453 rows x 7 columns] Results Ski rental prediction with revoscalepy
  • 22. # Store the variable we'll be predicting on. target = "RentalCount“ # Generate the training set. Set random_state to be able to replicate results. train = df.sample(frac=0.8, random_state=1) # Select anything not in the training set and put it in the testing set. test = df.loc[~df.index.isin(train.index)] # Initialize the model class. lin_model = LinearRegression() # Fit the model to the training data. lin_model.fit(train[columns], train[target]) Python Ski rental prediction with revoscalepy
  • 23. # Generate our predictions for the test set. lin_predictions = lin_model.predict(test[columns]) print("Predictions:", lin_predictions) # Compute error between our test predictions and the actual values. lin_mse = mean_squared_error(lin_predictions, test[target]) print("Computed error:", lin_mse) Python Predictions: [ 40. 38. 240. 39. 514. 48. 297. 25. 507. 24. 30. 54. 40. 26. 30. 34. 42. 390. 336. 37. 22. 35. 55. 350. 252. 370. 499. 48. 37. 494. 46. 25. 312. 390. 35. 35. 421. 39. 176. 21. 33. 452. 34. 28. 37. 260. 49. 577. 312. 24. 24. 390. 34. 64. 26. 32. 33. 358. 348. 25. 35. 48. 39. 44. 58. 24. 350. 651. 38. 468. 26. 42. 310. 709. 155. 26. 648. 617. 26. 846. 729. 44. 432. 25. 39. 28. 325. 46. 36. 50. 63.] Computed error: 3.59831533436e-26 Results Ski rental prediction with revoscalepy
  • 24. Ski rental prediction with SQL store procedures USE TutorialDB; DROP TABLE IF EXISTS rental_py_models; GO CREATE TABLE rental_py_models ( model_name VARCHAR(30) NOT NULL DEFAULT('default model’) PRIMARY KEY, model VARBINARY(MAX) NOT NULL); GO SQL DROP TABLE IF EXISTS py_rental_predictions; GO CREATE TABLE py_rental_predictions( [RentalCount_Predicted] [int] NULL, [RentalCount_Actual] [int] NULL, [Month] [int] NULL, [Day] [int] NULL, [WeekDay] [int] NULL, [Snow] [int] NULL, [Holiday] [int] NULL, [Year] [int] NULL); GO SQL
  • 25. -- Train model CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT) AS BEGIN EXECUTE sp_execute_external_script @language = N'Python' , @script = N' from sklearn.linear_model import LinearRegression import pickle df = rental_train_data lin_model = LinearRegression() lin_model.fit(df[columns], df[target]) trained_model = pickle.dumps(lin_model)’ , @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015' , @input_data_1_name = N'rental_train_data' , @params = N'@trained_model varbinary(max) OUTPUT' , @trained_model = @trained_model OUTPUT; END; GO SQL Ski rental prediction with SQL store procedures
  • 26. --Execute model training DECLARE @model VARBINARY(MAX); EXEC generate_rental_py_model @model OUTPUT; INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model); SQL Ski rental prediction with SQL store procedures
  • 27. DROP PROCEDURE IF EXISTS py_predict_rentalcount; GO CREATE PROCEDURE py_predict_rentalcount (@model varchar(100)) AS BEGIN DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model); EXEC sp_execute_external_script @language = N‘Python’, @script = N‘ rental_model = pickle.loads(py_model) df = rental_score_data # [… python code here …] lin_predictions = rental_model.predict(df[columns]) predictions_df = pd.DataFrame(lin_predictions) OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1) ’ -- [… continues in next slide…] SQL Ski rental prediction with SQL store procedures
  • 28. --[… from previous slide…] , @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015' , @input_data_1_name = N'rental_score_data' , @params = N'@py_model varbinary(max)' , @py_model = @py_model WITH RESULT SETS (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float)); END; GO SQL -- Execute the prediction EXEC py_predict_rentalcount 'linear_model'; SELECT * FROM py_rental_predictions; SQL Ski rental prediction with SQL store procedures
  • 29. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 31. low level features high level featuresmedium level features Interesting paper about representations: https://arxiv.org/abs/1411.1792 Multiple Levels of Representation
  • 32. $1 million in prizes ! Determine whether a patient has cancer or not competition Lung Cancer Competition
  • 33. Data: CT scans of the lung 1595 patients with a diagnostic 200-500 scans per patient Images of 512x512px
  • 34. ImageNet dataset Lung cancer dataset weight transference Transfer Learning
  • 35. Forward and backward propagation input hidden hidden hidden hidden hidden output Standard Training
  • 36. Transference option 1: freeze n-1 layers Frozen layers input hidden hidden hidden hidden hidden output
  • 37. Transference option 2: freeze initial layers Frozen layers Forward and backward propagation input hidden hidden hidden hidden hidden output
  • 38. Transference option 3: fine tuning Forward and backward propagation input hidden hidden hidden hidden hidden output
  • 39. 3 224 224 last layer ImageNet ResNet N layers penultimate layer cat Pretrained ResNet 152 source: https://github.com/Azure/sql_python_deep_learning
  • 40. Solution: CNN Featurizer source: https://github.com/Azure/sql_python_deep_learning 3 224 224 ResNet N-1 layers penultimate layer CNTK (53min)k batch of images = 1 patient
  • 41. 3 224 224 ResNet N-1 layers penultimate layer no cancer CNTK (53min) LightGBM (2min) Boosted tree k batch of images = 1 patient features Solution: Boosted Tree Classifier source: https://github.com/Azure/sql_python_deep_learning
  • 42. (Extra slide): 2nd place in the competition source: https://github.com/juliandewit/kaggle_ndsb2017
  • 43. Deep Learning in SQL Server: Training sp.dbo.GenerateFeatures CNTK with GPUs sp.dbo.TrainLungCancerModel LightGBM Populate tables
  • 44. Deep Learning in SQL Server: Operationalization sp.dbo.PredictLungCancer Web App
  • 45. Demo
  • 46. Solution in SQL Server 2017
  • 47. Solution in SQL Server 2017
  • 48. Solution in SQL Server 2017
  • 49. Solution in SQL Server 2017
  • 50. Solution in SQL Server 2017
  • 51. Solution in SQL Server 2017
  • 52. Solution in SQL Server 2017
  • 53. Solution in SQL Server 2017
  • 57. THANK YOU @ODSC Dr. Miguel Fierro @miguelgfierro https://miguelgfierro.com