SlideShare a Scribd company logo
BigQueryML
MachineLearningatScaleusingSQL
Márton Kodok / @martonkodok
Google Developer Expert on Cloud at REEA.net - Targu Mures
May 2019 - Cluj Napoca, Romania
● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow 133k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery/Redis and database engine expert
● Active in mentoring and IT community
StackOverflow: pentium10
GitHub: pentium10
Slideshare: martonkodok
Twitter: @martonkodok
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
About me
1. Application development in the Cloud using Serverless services
2. What is BigQuery? - Data warehouse in the Cloud
3. Introduction to BigQuery ML - execute ML models using SQL
4. Practical use cases
5. Segment and recommend with BigQuery ML
6. Conclusions
Agenda
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
REEA.net uses GCP
Build on the same infrastructure
that powers Google
Google sees serverless as
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Programming model
Focus on code
Event-driven
Stateless
Operational model Billing model
Pay for usageZero ops
Automatic scaling
Managed security
Dev Ops $
Serverless is more than a set of functions
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Cloud Dataflow Cloud Tasks
Cloud Storage
Cloud PubSub
Cloud Functions App Engine
BigQuery
Stackdriver
Serverless is about maximizing elasticity, cost
savings, and agility of cloud computing.
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Serverless types
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Triggered Code Platforms
Cloud Functions
ApplicationEvent Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Event
Triggered
Cloud
Functions
Triggered Code
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Result
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Analytics-as-a-Service - Data Warehouse in the Cloud
Scales into Petabytes on Managed Infrastructure - load up to 5TB large files
Familiar DB Structure (table, columns, views, struct, nested, JSON)
SQL 2011 + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Cloud Storage + Pub/Sub connectors
BigQuery ML enables users to create machine learning models by SQL queries
Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *May 2019
What is BigQuery?
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
BigQuery: federated data access warehouse
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Application & Presentation
Audit logs
Billing entries
Stackdriver
Firebase
Google
Marketing
Platform
Cloud
Dataflow
Cloud
Storage
Report & Share
Business Analysis
BI Interface
Data Studio 360
Analysis
Processing
ML
Frontend
Platform Services
Real-Time Events
Multiple Platforms
Database
SQL
“ Data needs to be processed in
multiple services.
How can we pipe to multiple places?
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load / Export
Replay
Standard
Devices
HTTPS
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Cloud
Functions
● SQL 2011 standard
● big costs saving with partitioning/clustering
● ability to throw in / join all kind of data
● run raw ad-hoc queries (either by analysts/sales or Devs)
● inspiring ML functions - devs no longer leave the IDE
● pricing model 1TB free every month
● no more throwing away-, expiring-, aggregating old data
● no running out of resources
Our benefits using BigQuery
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
What is BigQueryML?
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
BigQuery ML
1. Execute ML initiatives without moving
data from BigQuery
2. Integrate on models in SQL in BigQuery
to increase development speed
3. Automate common ML tasks and
hyperparameter tuning
● Leverage BigQuery’s processing power to build a model with SQL syntax
● Create model from tabular data
● Auto-split of data into training and test
● Auto-tuned learning rate
● Model evaluation charts on BigQuery UI
● Ability to join the recommendation output with your own tables
Behind the scenes - through two lines of SQL
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Developer SQL Analyst Data Scientist Use cases and skills
TensorFlow and
CloudML Engine
● Build and deploy state-of-art custom models
● Requires deep understanding of ML and
programming
BigQuery ML
● Build and deploy custom models using SQL
● Requires only basic understanding of ML
AutoML and
CloudML APIs
● Build and deploy Google-provided models for
standard use cases
● Requires almost no ML knowledge
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Making ML accessible for all audiences
● Linearregression for forecasting
● Binaryor Multiclasslogisticregression for classification (labels can have up to 50 unique values)
● K-meansclustering for data segmentation (unsupervised learning - not require labels/training)
● Matrixfactorization (Alpha)
● DeepNeuralNetworks using Tensorflow (Alpha)
● ImportTensorFlowmodels for prediction in BigQuery (Alpha)
● Feature pre-processingfunctions (Alpha)
Alphas are whitelist only. Please contact your Google CE/Sales/TAM.
Supported models in BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Objectives:
● Create a binary logistic regression model using the CREATEMODEL statement
● TheML.EVALUATE function to evaluate the ML model
● TheML.PREDICTfunction to make predictions using the ML model
In this tutorial, you use the sample Google Analytics dataset for BigQuery
to create a model that predicts whether a website visitor will make a transaction.
https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start
Getting started with BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Create a binary logistic regression model
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Evaluate your model
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Predict
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Predict purchases per user
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Use cases:
● Customer segmentation
● Data quality
Options and defaults
● Number of clusters: Default log10
(num_rows) clusters
● Distance type - Euclidean(default), Cosine
● Supports all major SQL data types including GIS
K-means clustering
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “kmeans”)
AS SELECT..
ml.PREDICT maps rows to closest clusters
ml.CENTROID for cluster centroids
ml.EVALUATE
ml.TRAINING_INFO
ml.FEATURE_INFO
Available data:
● Encode yes/no features
(eg: has a microwave, has a kitchen, has a TV, has a bathroom)
● Can apply clustering on the encoded data
K-means clustering: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Premise
We can identity oddities
(potential data quality issues)
by grouping things together
and separating outliers.
K-means clustering: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Use cases:
● Product recommendation
● Marketing campaign target optimization tool
Options and defaults
● Input: User, Item, Rating
● Can use L2 regularization
● Specify training-test split (default random 80-20)
Matrix Factorization (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “matrix_factorization”)
AS SELECT..
ml.PREDICT for user-item ratings
ml.RECOMMEND for full user-item matrix
ml.EVALUATE
ml.WEIGHTS
ml.TRAINING_INFO
ml.FEATURE_INFO
Available data:
● User
● Item
● Rating
Problem
● assigning values for previously unknown values
(zeros in our case)
Matrix Factorization: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Segmentation
● Rating can be any metric of views, visits, purchases, edits, saves etc… or combined.
● Try and play with different models based on different rating values.
Recommendation
● assigning values for previously unknown values (zeros in our case)
● based on the recommendation results you can order by / display your results
Marketing campaign
● who to target with an AD campaign? I have budget only for 1000 people.
● use as an optimization tool - which customers will likely to buy?
Summary: Segment and recommend with BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Automation
● Run the process daily
● Determine hyperparameters
● Surface the results and route them somewhere for inspection and improvement
Testing
● AB test around impact of data quality on conversion and customer NPS (net promoter score)
Improvements
● Determine, and explore outliers
● Repeat, automate
Considerations
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
What is on the roadmap of BigQueryML?
Cloud Next 19 announcements
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
New on BigQuery UI - Training tab charts
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
New on BigQuery UI - Evaluation charts
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
New on BigQuery UI - Confusion Matrix
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Percentage of actual
labels that were
classified:
- Correctly (Blue)
- Incorrectly (Grey)
Use cases:
● Capture non-linear relationship between features and
label for classification and regression
Options and defaults
● Hidden units (optional)
● Hidden layers (optional)
● Drop_out (optional)
● Batch_size (optional)
Deep Neural Networks using TensorFlow (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“dnn_classifier”)
AS SELECT..
CREATE MODEL yourmodel
OPTIONS (model_type =“dnn_regressor”) AS
SELECT..
NCAA Basketball 3 point attempt prediction
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Use cases:
● Easily add TensorFlow predictions to BigQuery
(AirFlow or Composer) pipelines
● Build unstructured data models in TensorFlow,
predict in BigQuery
Key alpha restrictions
● Model size limit of 250MB
Import TensorFlow models for prediction (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“tensorflow”,
Model_path =’gs://’)
ml.PREDICT()
DEMO
Search 'QueryIt Smart' on GitHub to learn more.
Conclusion
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
● 10 GB of data processed by queries that contain CREATEMODEL statements per month is free.
● Model creation$250perTB
● Evaluation, inspection, and prediction $5perTB
● Limited to 50iterations
● You are limited to 1,000CREATEMODEL queries per day per project
● BigQuery ML supports the same regions as BigQuery (US, EU, ASIA)
Pricing/quotas/limits of BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
● ML is hard, we don’t have dedicated team.
With BigQuery ML you need only devs who have good SQL skills.
● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML
● Understand how to connect pieces of tabular data to fulfil a business requirement
● Start using the Cloud benefits and BigQuery ML as a complementary system
● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements
#increase #innovation #work on #fun #stuff
Common mindset blockers
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
● Democratizes the use of ML by empowering data analysts to build and run models using existing
business intelligence tools and spreadsheets
● Generalist team. Models are trained using SQL. There is no need to program an ML solution using
Python or Java.
● Increases the innovation and speed of model development by removing the need to export data from
the data warehouse.
● A Model serves a purpose. Easy to change/recycle.
Benefits of BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
The possibilities are endless
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Marketing Retail IndustrialandIoT Media/gaming
Predict customer value
Predict funnel conversion
Personalize ads, email,
webpage content
Optimize inventory
Forecase revenue
Enable product
recommendations
Optimize staff promotions
Forecast demand for
parking, traffic utilities,
personnel
Prevent equipment
downtime
Predict maintenance needs
Personalize content
Predict game difficulty
Predict player lifetime value
Thank you.
Slides available on: slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity to deliver
projects.
Ad

Recommended

PDF
BigQuery for Beginners
Better&Stronger
 
PDF
Google BigQuery
Matthias Feys
 
PPTX
Google Vertex AI
VikasBisoi
 
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
PPTX
Getting Started with BigQuery ML
Dan Sullivan, Ph.D.
 
PPTX
bigquery.pptx
Harissh16
 
PDF
Big Query Basics
Ido Green
 
PPTX
Introduction to Google Cloud Platform
dhruv_chaudhari
 
PDF
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 
PDF
Introduction to Google Compute Engine
Colin Su
 
PDF
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
PPTX
MLOps in action
Pieter de Bruin
 
PDF
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
PPTX
Power platform power automate in a day
Narapat Patcharapornpun
 
PDF
Server monitoring using grafana and prometheus
Celine George
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PPTX
Google Cloud Platform (GCP)
Chetan Sharma
 
PDF
What is MLOps
Henrik Skogström
 
PPTX
Airflow at lyft
Tao Feng
 
PPTX
Azure Logic Apps
BizTalk360
 
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
PDF
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PPTX
Azure Synapse Analytics Overview (r1)
James Serra
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
PDF
CICD Pipelines for Microservices Best Practices
Codefresh
 
PDF
Big query
Tanvi Parikh
 
PDF
Google Cloud Networking Deep Dive
Michelle Holley
 
PPTX
Container based CI/CD on GitHub Actions
Casey Lee
 
PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 

More Related Content

What's hot (20)

PDF
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 
PDF
Introduction to Google Compute Engine
Colin Su
 
PDF
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
PPTX
MLOps in action
Pieter de Bruin
 
PDF
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
PPTX
Power platform power automate in a day
Narapat Patcharapornpun
 
PDF
Server monitoring using grafana and prometheus
Celine George
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PPTX
Google Cloud Platform (GCP)
Chetan Sharma
 
PDF
What is MLOps
Henrik Skogström
 
PPTX
Airflow at lyft
Tao Feng
 
PPTX
Azure Logic Apps
BizTalk360
 
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
PDF
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PPTX
Azure Synapse Analytics Overview (r1)
James Serra
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
PDF
CICD Pipelines for Microservices Best Practices
Codefresh
 
PDF
Big query
Tanvi Parikh
 
PDF
Google Cloud Networking Deep Dive
Michelle Holley
 
PPTX
Container based CI/CD on GitHub Actions
Casey Lee
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 
Introduction to Google Compute Engine
Colin Su
 
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
MLOps in action
Pieter de Bruin
 
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
Power platform power automate in a day
Narapat Patcharapornpun
 
Server monitoring using grafana and prometheus
Celine George
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
Google Cloud Platform (GCP)
Chetan Sharma
 
What is MLOps
Henrik Skogström
 
Airflow at lyft
Tao Feng
 
Azure Logic Apps
BizTalk360
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Azure Synapse Analytics Overview (r1)
James Serra
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
CICD Pipelines for Microservices Best Practices
Codefresh
 
Big query
Tanvi Parikh
 
Google Cloud Networking Deep Dive
Michelle Holley
 
Container based CI/CD on GitHub Actions
Casey Lee
 

Similar to BigQuery ML - Machine learning at scale using SQL (20)

PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
PDF
BigdataConference Europe - BigQuery ML
Márton Kodok
 
PDF
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
PDF
Supercharge your data analytics with BigQuery
Márton Kodok
 
PDF
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
PPTX
Introduction Data Warehouse With BigQuery
Yatno Sudar
 
PDF
Google BigQuery for Everyday Developer
Márton Kodok
 
PDF
An overview of BigQuery
GirdhareeSaran
 
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Márton Kodok
 
PDF
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Tatvic Analytics
 
PDF
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
PDF
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
PDF
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
Márton Kodok
 
PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Márton Kodok
 
PDF
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
PDF
From SF with Love
OpenSistemas
 
PDF
A few Challenges to Make Machine Learning Easy
Pemo Theodore
 
PDF
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
e-dialog GmbH
 
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
BigdataConference Europe - BigQuery ML
Márton Kodok
 
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
Supercharge your data analytics with BigQuery
Márton Kodok
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
Introduction Data Warehouse With BigQuery
Yatno Sudar
 
Google BigQuery for Everyday Developer
Márton Kodok
 
An overview of BigQuery
GirdhareeSaran
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Márton Kodok
 
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Tatvic Analytics
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
Márton Kodok
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Márton Kodok
 
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
From SF with Love
OpenSistemas
 
A few Challenges to Make Machine Learning Easy
Pemo Theodore
 
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
e-dialog GmbH
 
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
PDF
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
PDF
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
PDF
Build applications with generative AI on Google Cloud
Márton Kodok
 
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
PDF
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
PDF
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
PDF
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Márton Kodok
 
PDF
Google Cloud Platform Solutions for DevOps Engineers
Márton Kodok
 
PDF
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
PDF
Next18 Extended Targu Mures - Bringing the Cloud to you
Márton Kodok
 
PDF
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
Build applications with generative AI on Google Cloud
Márton Kodok
 
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Márton Kodok
 
Google Cloud Platform Solutions for DevOps Engineers
Márton Kodok
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Márton Kodok
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
Ad

Recently uploaded (20)

PDF
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
PDF
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
PDF
Sysinfo OST to PST Converter Infographic
SysInfo Tools
 
PPTX
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
PPTX
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
DOCX
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
PDF
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
PPTX
arctitecture application system design os dsa
za241967
 
PPTX
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PDF
Simplify Task, Team, and Project Management with Orangescrum Work
Orangescrum
 
PPTX
Advance Doctor Appointment Booking App With Online Payment
AxisTechnolabs
 
PPTX
From Code to Commerce, a Backend Java Developer's Galactic Journey into Ecomm...
Jamie Coleman
 
PDF
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PPTX
Top Time Tracking Solutions for Accountants
oliviareed320
 
PPTX
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
PDF
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
PPTX
HYBRIDIZATION OF ALKANES AND ALKENES ...
karishmaduhijod1
 
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
Sysinfo OST to PST Converter Infographic
SysInfo Tools
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
arctitecture application system design os dsa
za241967
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Simplify Task, Team, and Project Management with Orangescrum Work
Orangescrum
 
Advance Doctor Appointment Booking App With Online Payment
AxisTechnolabs
 
From Code to Commerce, a Backend Java Developer's Galactic Journey into Ecomm...
Jamie Coleman
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Top Time Tracking Solutions for Accountants
oliviareed320
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
HYBRIDIZATION OF ALKANES AND ALKENES ...
karishmaduhijod1
 

BigQuery ML - Machine learning at scale using SQL

  • 1. BigQueryML MachineLearningatScaleusingSQL Márton Kodok / @martonkodok Google Developer Expert on Cloud at REEA.net - Targu Mures May 2019 - Cluj Napoca, Romania
  • 2. ● Geek. Hiker. Do-er. ● Among the Top3 romanians on Stackoverflow 133k reputation ● Google Developer Expert on Cloud technologies ● Crafting Web/Mobile backends at REEA.net ● BigQuery/Redis and database engine expert ● Active in mentoring and IT community StackOverflow: pentium10 GitHub: pentium10 Slideshare: martonkodok Twitter: @martonkodok BigQuery ML - Machine Learning at Scale using SQL @martonkodok About me
  • 3. 1. Application development in the Cloud using Serverless services 2. What is BigQuery? - Data warehouse in the Cloud 3. Introduction to BigQuery ML - execute ML models using SQL 4. Practical use cases 5. Segment and recommend with BigQuery ML 6. Conclusions Agenda BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 4. REEA.net uses GCP Build on the same infrastructure that powers Google
  • 5. Google sees serverless as BigQuery ML - Machine Learning at Scale using SQL @martonkodok Programming model Focus on code Event-driven Stateless Operational model Billing model Pay for usageZero ops Automatic scaling Managed security Dev Ops $
  • 6. Serverless is more than a set of functions BigQuery ML - Machine Learning at Scale using SQL @martonkodok Cloud Dataflow Cloud Tasks Cloud Storage Cloud PubSub Cloud Functions App Engine BigQuery Stackdriver
  • 7. Serverless is about maximizing elasticity, cost savings, and agility of cloud computing. BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 8. Serverless types BigQuery ML - Machine Learning at Scale using SQL @martonkodok Triggered Code Platforms
  • 9. Cloud Functions ApplicationEvent Sourcing Frontend Platform Services Metrics / Logs/ Streaming Event Triggered Cloud Functions Triggered Code BigQuery ML - Machine Learning at Scale using SQL @martonkodok Result
  • 10. BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 11. Analytics-as-a-Service - Data Warehouse in the Cloud Scales into Petabytes on Managed Infrastructure - load up to 5TB large files Familiar DB Structure (table, columns, views, struct, nested, JSON) SQL 2011 + Javascript UDF (User Defined Functions) Integrates with Google Sheets + Cloud Storage + Pub/Sub connectors BigQuery ML enables users to create machine learning models by SQL queries Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *May 2019 What is BigQuery? BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 12. BigQuery: federated data access warehouse BigQuery ML - Machine Learning at Scale using SQL @martonkodok Application & Presentation Audit logs Billing entries Stackdriver Firebase Google Marketing Platform Cloud Dataflow Cloud Storage Report & Share Business Analysis BI Interface Data Studio 360 Analysis Processing ML Frontend Platform Services Real-Time Events Multiple Platforms Database SQL
  • 13. “ Data needs to be processed in multiple services. How can we pipe to multiple places? BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 14. Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 15. Data Pipeline Integration at REEA.net Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load / Export Replay Standard Devices HTTPS BigQuery ML - Machine Learning at Scale using SQL @martonkodok Cloud Functions
  • 16. ● SQL 2011 standard ● big costs saving with partitioning/clustering ● ability to throw in / join all kind of data ● run raw ad-hoc queries (either by analysts/sales or Devs) ● inspiring ML functions - devs no longer leave the IDE ● pricing model 1TB free every month ● no more throwing away-, expiring-, aggregating old data ● no running out of resources Our benefits using BigQuery BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 17. What is BigQueryML? BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 18. BigQuery ML - Machine Learning at Scale using SQL @martonkodok BigQuery ML 1. Execute ML initiatives without moving data from BigQuery 2. Integrate on models in SQL in BigQuery to increase development speed 3. Automate common ML tasks and hyperparameter tuning
  • 19. ● Leverage BigQuery’s processing power to build a model with SQL syntax ● Create model from tabular data ● Auto-split of data into training and test ● Auto-tuned learning rate ● Model evaluation charts on BigQuery UI ● Ability to join the recommendation output with your own tables Behind the scenes - through two lines of SQL BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 20. Developer SQL Analyst Data Scientist Use cases and skills TensorFlow and CloudML Engine ● Build and deploy state-of-art custom models ● Requires deep understanding of ML and programming BigQuery ML ● Build and deploy custom models using SQL ● Requires only basic understanding of ML AutoML and CloudML APIs ● Build and deploy Google-provided models for standard use cases ● Requires almost no ML knowledge BigQuery ML - Machine Learning at Scale using SQL @martonkodok Making ML accessible for all audiences
  • 21. ● Linearregression for forecasting ● Binaryor Multiclasslogisticregression for classification (labels can have up to 50 unique values) ● K-meansclustering for data segmentation (unsupervised learning - not require labels/training) ● Matrixfactorization (Alpha) ● DeepNeuralNetworks using Tensorflow (Alpha) ● ImportTensorFlowmodels for prediction in BigQuery (Alpha) ● Feature pre-processingfunctions (Alpha) Alphas are whitelist only. Please contact your Google CE/Sales/TAM. Supported models in BigQuery ML BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 22. Objectives: ● Create a binary logistic regression model using the CREATEMODEL statement ● TheML.EVALUATE function to evaluate the ML model ● TheML.PREDICTfunction to make predictions using the ML model In this tutorial, you use the sample Google Analytics dataset for BigQuery to create a model that predicts whether a website visitor will make a transaction. https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start Getting started with BigQuery ML BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 23. Create a binary logistic regression model BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 24. Evaluate your model BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 25. Predict BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 26. Predict purchases per user BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 27. Use cases: ● Customer segmentation ● Data quality Options and defaults ● Number of clusters: Default log10 (num_rows) clusters ● Distance type - Euclidean(default), Cosine ● Supports all major SQL data types including GIS K-means clustering BigQuery ML - Machine Learning at Scale using SQL @martonkodok CREATE MODEL yourmodel OPTIONS (model_type = “kmeans”) AS SELECT.. ml.PREDICT maps rows to closest clusters ml.CENTROID for cluster centroids ml.EVALUATE ml.TRAINING_INFO ml.FEATURE_INFO
  • 28. Available data: ● Encode yes/no features (eg: has a microwave, has a kitchen, has a TV, has a bathroom) ● Can apply clustering on the encoded data K-means clustering: Problem definition BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 29. Premise We can identity oddities (potential data quality issues) by grouping things together and separating outliers. K-means clustering: Problem definition BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 30. Use cases: ● Product recommendation ● Marketing campaign target optimization tool Options and defaults ● Input: User, Item, Rating ● Can use L2 regularization ● Specify training-test split (default random 80-20) Matrix Factorization (Alpha) BigQuery ML - Machine Learning at Scale using SQL @martonkodok CREATE MODEL yourmodel OPTIONS (model_type = “matrix_factorization”) AS SELECT.. ml.PREDICT for user-item ratings ml.RECOMMEND for full user-item matrix ml.EVALUATE ml.WEIGHTS ml.TRAINING_INFO ml.FEATURE_INFO
  • 31. Available data: ● User ● Item ● Rating Problem ● assigning values for previously unknown values (zeros in our case) Matrix Factorization: Problem definition BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 32. Segmentation ● Rating can be any metric of views, visits, purchases, edits, saves etc… or combined. ● Try and play with different models based on different rating values. Recommendation ● assigning values for previously unknown values (zeros in our case) ● based on the recommendation results you can order by / display your results Marketing campaign ● who to target with an AD campaign? I have budget only for 1000 people. ● use as an optimization tool - which customers will likely to buy? Summary: Segment and recommend with BigQuery ML BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 33. Automation ● Run the process daily ● Determine hyperparameters ● Surface the results and route them somewhere for inspection and improvement Testing ● AB test around impact of data quality on conversion and customer NPS (net promoter score) Improvements ● Determine, and explore outliers ● Repeat, automate Considerations BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 34. What is on the roadmap of BigQueryML? Cloud Next 19 announcements BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 35. New on BigQuery UI - Training tab charts BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 36. New on BigQuery UI - Evaluation charts BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 37. New on BigQuery UI - Confusion Matrix BigQuery ML - Machine Learning at Scale using SQL @martonkodok Percentage of actual labels that were classified: - Correctly (Blue) - Incorrectly (Grey)
  • 38. Use cases: ● Capture non-linear relationship between features and label for classification and regression Options and defaults ● Hidden units (optional) ● Hidden layers (optional) ● Drop_out (optional) ● Batch_size (optional) Deep Neural Networks using TensorFlow (Alpha) BigQuery ML - Machine Learning at Scale using SQL @martonkodok CREATE MODEL yourmodel OPTIONS (model_type =“dnn_classifier”) AS SELECT.. CREATE MODEL yourmodel OPTIONS (model_type =“dnn_regressor”) AS SELECT..
  • 39. NCAA Basketball 3 point attempt prediction BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 40. Use cases: ● Easily add TensorFlow predictions to BigQuery (AirFlow or Composer) pipelines ● Build unstructured data models in TensorFlow, predict in BigQuery Key alpha restrictions ● Model size limit of 250MB Import TensorFlow models for prediction (Alpha) BigQuery ML - Machine Learning at Scale using SQL @martonkodok CREATE MODEL yourmodel OPTIONS (model_type =“tensorflow”, Model_path =’gs://’) ml.PREDICT() DEMO Search 'QueryIt Smart' on GitHub to learn more.
  • 41. Conclusion BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 42. ● 10 GB of data processed by queries that contain CREATEMODEL statements per month is free. ● Model creation$250perTB ● Evaluation, inspection, and prediction $5perTB ● Limited to 50iterations ● You are limited to 1,000CREATEMODEL queries per day per project ● BigQuery ML supports the same regions as BigQuery (US, EU, ASIA) Pricing/quotas/limits of BigQuery ML BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 43. ● ML is hard, we don’t have dedicated team. With BigQuery ML you need only devs who have good SQL skills. ● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML ● Understand how to connect pieces of tabular data to fulfil a business requirement ● Start using the Cloud benefits and BigQuery ML as a complementary system ● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements #increase #innovation #work on #fun #stuff Common mindset blockers BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 44. ● Democratizes the use of ML by empowering data analysts to build and run models using existing business intelligence tools and spreadsheets ● Generalist team. Models are trained using SQL. There is no need to program an ML solution using Python or Java. ● Increases the innovation and speed of model development by removing the need to export data from the data warehouse. ● A Model serves a purpose. Easy to change/recycle. Benefits of BigQuery ML BigQuery ML - Machine Learning at Scale using SQL @martonkodok
  • 45. The possibilities are endless BigQuery ML - Machine Learning at Scale using SQL @martonkodok Marketing Retail IndustrialandIoT Media/gaming Predict customer value Predict funnel conversion Personalize ads, email, webpage content Optimize inventory Forecase revenue Enable product recommendations Optimize staff promotions Forecast demand for parking, traffic utilities, personnel Prevent equipment downtime Predict maintenance needs Personalize content Predict game difficulty Predict player lifetime value
  • 46. Thank you. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects.