SlideShare a Scribd company logo
BigQuery Basics

Paris 2014
BigQuery Basics

Who? Why?
Ido Green
Solutions Architect
plus.google.com/greenido

greenido.wordpress.com
BigQuery Basics

Topics we cover in this lesson
●
●
●
●
●
●
●

BigQuery Overview
Typical Uses
Project Hierarchy
Access Control and Security
Datasets and Tables
Tools
Demos
BigQuery Basics

How does BigQuery fit in the analytics landscape?
● MapReduce based analysis can be slow for ad-hoc queries
● Managing data centers and tuning software takes time & money
● Analytics tools should be services
BigQuery Basics

Why BigQuery?
● Generate big data reports require expensive servers
and skilled database administrators
● Interacting with big data has been expensive, slow and
inefficient
● BigQuery changes all that
○ Reducing time and expense to query data
BigQuery Basics

What's BigQuery?
● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: seconds to write, seconds to return
○ Uses a SQL-style query syntax
○ It's a service, accessed by a RESTful API
● Reliable and secure
○ Replicated across multiple sites
○ Secured through Access Control Lists
● Scalable
○ Store hundreds of terabytes
○ Pay only for what you use
● Fast (really)
○ Run ad hoc queries on multi-terabyte data sets in seconds
BigQuery Basics

Analyzing Large Amount of Data
.....at high speed

demobigquery.appspot.com
Uses
BigQuery Basics

Typical Uses
Analyzing query results using a visualization library such as Google
Charts Tools API
BigQuery Basics

Typical Uses
Another way to analyze query results with Google Spreadsheets
○

greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/

○

greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
BigQuery Basics

BigQuery Use Cases
● Log Analysis. Making sense of computer generated records
● Retailer. Using data to forecast product sales
● Ads Targeting. Targeting proper customer sections
● Sensor Data. Collect and visualize ambient data
● Data Mashup. Query terabytes of heterogeneous data
BigQuery Basics

Some Customer Case Studies
Uses BigQuery to hone ad targeting
and gain insights into their business
Dashboards using BigQuery to
analyze booking and inventory data

Use BigQuery to provide their
customers ways to expand game
engagement and find new channels for
monetization
Used BigQuery, App Engine and the
Visualizaton API to build a business
intelligence solution
BigQuery Basic Technical Details
BigQuery Basics

Project Hierarchy
● Project. All data in BigQuery belongs inside a project
○ Set of users, APIs, authentication, billing information
● Dataset. Holds one or more tables
○ Lowest access control unit (to which ACLs are applied)
● Table. Row-column structure that contains actual data
● Job. Used to start potentially long running queries
BigQuery Basics

Datasets and Tables
Table name is represented as
follows:
● Current Project
<dataset>.<table
name>
● Different Project
<project>:<dataset>.<table>

e.g. publicdata:samples.wikipedia
BigQuery Basics

Schema Example
● Demographics about names occurrence table schema
name:string,gender:string,count:integer
BigQuery Basics

Data Types
●
●
●
●
●

String
○ UTF-8 encoded, <64kB
Integer
○ 64 bit signed
Float
Boolean
○ "true" or "false", case insensitive
Timestamp
○ String format
■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM]
○ Numeric format (seconds from UNIX epoch)
■ 1234567890, 1.234567890123456E9

(*) Max row size: 64kB
Date type is supported as timestamp
BigQuery Basics

Data Format
BigQuery supports the following format for loading data:
1. Comma Separated Values (CSV)
2. JSON
a. BigQuery can load data faster,
embedded newlines.
b. Supports nested/repeated data fields

if your data con
BigQuery Basics

Repeated and Nested Fields

[
[

Schema
example

{
{
"fields": [
"fields": [
{
{

Loading data with repeated and
nested fields is supported by
JSON data format only

"mode":
"mode":
"name":
"name":

"nullable",
"nullable",
"country",
"country",

"type": "string"
"type": "string"
},
},
{
{
"mode": "nullable",
"mode": "nullable",
"name": "city",
"name": "city",
"type": "string"
"type": "string"
}
}
],
],
"mode": "repeated",
"mode": "repeated",
"name": "location",
"name": "location",
"type": "record"
"type": "record"
},
},
...........
...........
BigQuery Basics

Accessing BigQuery
● BigQuery Web browser
○

Imports/exports data, runs
queries

● bq command line tool
○ Performs operations from
the command line

● Service API
○ RESTful API to access
BigQuery programmatically

○

Requires authorization by
OAuth2

○

Google client libraries for
Python, Java, JavaScript,
PHP, ...

○
BigQuery Basics

Third-party Tools
ETL tools for loading data into BigQuery

Visualization and Business Intelligence
BigQuery Basics

Example of Visualization Tools
Using commercial visualization tools to graph the query results
BigQuery Basics

Loading Data Using the Web Browser
●
●
●
●

Upload from local disk or from Cloud Storage
Start the Web browser
Select Dataset
Create table and follow the wizard steps
BigQuery Basics

Loading Data Using bq Tool
"bq load" command
Syntax
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV]
destination_table data_source_uri table_schema

●
●
●

●

If not specified, the default file format is CSV (comma separated values)
The files can also use newline delimited JSON format
Schema
○ Either a filename or a comma-separated list of column_name:datatype
pairs that describe the file format.
Data source may be on local machine or on Cloud Storage
BigQuery Basics

Load Limitations
● 1,000 import jobs per table per day
● 10,000 import jobs per project per day
● File size (for both CSV and JSON)
○ 1GB for compressed file
○ 1TB for uncompressed
■ 4GB for uncompressed CSV with newlines in strings
● 10,000 files per import job
● 1TB per import job
BigQuery Basics

A Few Best Practices
CSV/JSON must be split into chunks less than 1TB
● "split" command with --line-bytes option
● Split to smaller files
○ Easier error recovery
○ To smaller data unit (day, month instead of year)
● Uploading to Cloud Storage is recommended

Cloud Storage

BigQuery
BigQuery Basics

A Few Best Practices
● Split Tables by Dates
○ Minimize cost of data scanned
○ Minimize query time
● Upload Multiple Files to Cloud Storage
○ Allows parallel upload into BigQuery
● Denormalize your data
BigQuery Basics

Exercise & Questions
BigQuery Basics

Exercise
Work through Big Query Exercise 1 -- Basics
● Use the BigQuery UI
● Use the bq command line tool
● Upload a dataset
You will query the public sample GSOD (global summary of
day) weather dataset.
You will get and upload earthquake data.
BigQuery Basics

Questions
● What are the different ways to load data into
BigQuery?
● What is the maximum size of data in a BigQuery
table?
● How can we import data into BigQuery?
○ What's the limitation?
○ What formats does BigQuery accept?
BigQuery Basics

Google I/O Data Sensing
● Start the BigQuery Web browser
● Click on Display Project in the project chooser dialog window
● Enter data-sensing-lab when prompted
● In the dataset data-sensing-lab:io_sensor_data, select the table
moscone_io13
● In the New Query box, enter the following query:
SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10

● Click Run Query button
● Scroll to see relevant results
BigQuery Basics

Data Structure
● Define table schema when creating table
● Data is stored in per-column structure
● Each column is handled separately and only combined when
necessary
Advantage of this data structure:
● No need to set index in advance
● Load only the relevant Columns
BigQuery Basics

Thank you!
Questions?

More Related Content

PDF
Big query
PPTX
bigquery.pptx
PDF
Google BigQuery
PDF
An overview of BigQuery
PDF
Bigquery 101
PDF
BigQuery for Beginners
PPTX
BigQuery walk through.pptx
PDF
Introduction to CICD
Big query
bigquery.pptx
Google BigQuery
An overview of BigQuery
Bigquery 101
BigQuery for Beginners
BigQuery walk through.pptx
Introduction to CICD

What's hot (20)

PDF
Google BigQuery - Features & Benefits
PDF
Getting started with BigQuery
PDF
BigQuery implementation
PDF
Google BigQuery Best Practices
PDF
Exploring BigData with Google BigQuery
PDF
Data engineering zoomcamp introduction
PPTX
MongoDB
PPTX
Introduction to RAG (Retrieval Augmented Generation) and its application
PPTX
Introduction to MongoDB
PPTX
The Basics of MongoDB
PPTX
Big Data Analytics with Hadoop
PPT
Introduction to MongoDB
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
PDF
Introduction to Stream Processing
PPTX
Mongodb basics and architecture
ODP
Elasticsearch for beginners
PPTX
Learn Big Data & Hadoop
PDF
Ml ops on AWS
PDF
Cassandra techniques de modelisation avancee
PPTX
Graph databases
Google BigQuery - Features & Benefits
Getting started with BigQuery
BigQuery implementation
Google BigQuery Best Practices
Exploring BigData with Google BigQuery
Data engineering zoomcamp introduction
MongoDB
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to MongoDB
The Basics of MongoDB
Big Data Analytics with Hadoop
Introduction to MongoDB
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Introduction to Stream Processing
Mongodb basics and architecture
Elasticsearch for beginners
Learn Big Data & Hadoop
Ml ops on AWS
Cassandra techniques de modelisation avancee
Graph databases
Ad

Similar to Big Query Basics (20)

PDF
Big Query - Women Techmarkers (Ukraine - March 2014)
PDF
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
PDF
Supercharge your data analytics with BigQuery
PDF
Quick Intro to Google Cloud Technologies
PDF
Executive Intro to BigQuery
PDF
Building Integrated Applications on Google's Cloud Technologies
PDF
Using ClickHouse for Experimentation
PDF
[Webinar] Interacting with BigQuery and Working with Advanced Queries
PDF
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PDF
Google Cloud Platform 2014Q1 - Starter Guide
PDF
Introduction to Google's Cloud Technologies
PDF
Intro to Google's Cloud Technologies
PDF
Building Apps on Google Cloud Technologies
PDF
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
PPTX
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
PPTX
Implementing google big query automation using google analytics data
PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
PDF
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
PDF
Building Integrated Applications on Google's Cloud Technologies
Big Query - Women Techmarkers (Ukraine - March 2014)
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Supercharge your data analytics with BigQuery
Quick Intro to Google Cloud Technologies
Executive Intro to BigQuery
Building Integrated Applications on Google's Cloud Technologies
Using ClickHouse for Experimentation
[Webinar] Interacting with BigQuery and Working with Advanced Queries
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Google Cloud Platform 2014Q1 - Starter Guide
Introduction to Google's Cloud Technologies
Intro to Google's Cloud Technologies
Building Apps on Google Cloud Technologies
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
Implementing google big query automation using google analytics data
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Integrated Applications on Google's Cloud Technologies
Ad

More from Ido Green (20)

PDF
מדמיון למציאות - 9.2024 - הרצאה במכינת כפר הנשיא
PDF
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
PDF
Crypto 101 and a bit more [Sep-2022]
PPTX
The Future of Continuous Software Updates Is Here
PPTX
Open Source & DevOps Market trends - Open Core Summit
PPTX
DevOps as a competitive advantage
PPTX
Data Driven DevOps & Technologies (swampUP 2019 keynote)
PDF
Create An Amazing Apps For The Google Assistant!
PDF
VUI Design
PDF
Google Assistant - Why? How?
PDF
The Google Assistant - Macro View (October 2017)
PDF
Actions On Google - GDD Europe 2017
PDF
Building conversational experiences with Actions on Google
PDF
Actions On Google - How? Why?
PDF
Startups Best Practices
PDF
Progressive Web Apps For Startups
PDF
Earn More Revenue With Firebase and AdMob
PDF
How To Grow Your User Base?
PDF
Amp Overview #YGLF 2016
PDF
AMP - Accelerated Mobile Pages
מדמיון למציאות - 9.2024 - הרצאה במכינת כפר הנשיא
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
Crypto 101 and a bit more [Sep-2022]
The Future of Continuous Software Updates Is Here
Open Source & DevOps Market trends - Open Core Summit
DevOps as a competitive advantage
Data Driven DevOps & Technologies (swampUP 2019 keynote)
Create An Amazing Apps For The Google Assistant!
VUI Design
Google Assistant - Why? How?
The Google Assistant - Macro View (October 2017)
Actions On Google - GDD Europe 2017
Building conversational experiences with Actions on Google
Actions On Google - How? Why?
Startups Best Practices
Progressive Web Apps For Startups
Earn More Revenue With Firebase and AdMob
How To Grow Your User Base?
Amp Overview #YGLF 2016
AMP - Accelerated Mobile Pages

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
August Patch Tuesday
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Mushroom cultivation and it's methods.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
Network Security Unit 5.pdf for BCA BBA.
August Patch Tuesday
Group 1 Presentation -Planning and Decision Making .pptx
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
cloud_computing_Infrastucture_as_cloud_p
Mushroom cultivation and it's methods.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
SOPHOS-XG Firewall Administrator PPT.pptx

Big Query Basics

  • 2. BigQuery Basics Who? Why? Ido Green Solutions Architect plus.google.com/greenido greenido.wordpress.com
  • 3. BigQuery Basics Topics we cover in this lesson ● ● ● ● ● ● ● BigQuery Overview Typical Uses Project Hierarchy Access Control and Security Datasets and Tables Tools Demos
  • 4. BigQuery Basics How does BigQuery fit in the analytics landscape? ● MapReduce based analysis can be slow for ad-hoc queries ● Managing data centers and tuning software takes time & money ● Analytics tools should be services
  • 5. BigQuery Basics Why BigQuery? ● Generate big data reports require expensive servers and skilled database administrators ● Interacting with big data has been expensive, slow and inefficient ● BigQuery changes all that ○ Reducing time and expense to query data
  • 6. BigQuery Basics What's BigQuery? ● Service for interactive analysis of massive datasets (TBs) ○ Query billions of rows: seconds to write, seconds to return ○ Uses a SQL-style query syntax ○ It's a service, accessed by a RESTful API ● Reliable and secure ○ Replicated across multiple sites ○ Secured through Access Control Lists ● Scalable ○ Store hundreds of terabytes ○ Pay only for what you use ● Fast (really) ○ Run ad hoc queries on multi-terabyte data sets in seconds
  • 7. BigQuery Basics Analyzing Large Amount of Data .....at high speed demobigquery.appspot.com
  • 9. BigQuery Basics Typical Uses Analyzing query results using a visualization library such as Google Charts Tools API
  • 10. BigQuery Basics Typical Uses Another way to analyze query results with Google Spreadsheets ○ greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/ ○ greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
  • 11. BigQuery Basics BigQuery Use Cases ● Log Analysis. Making sense of computer generated records ● Retailer. Using data to forecast product sales ● Ads Targeting. Targeting proper customer sections ● Sensor Data. Collect and visualize ambient data ● Data Mashup. Query terabytes of heterogeneous data
  • 12. BigQuery Basics Some Customer Case Studies Uses BigQuery to hone ad targeting and gain insights into their business Dashboards using BigQuery to analyze booking and inventory data Use BigQuery to provide their customers ways to expand game engagement and find new channels for monetization Used BigQuery, App Engine and the Visualizaton API to build a business intelligence solution
  • 14. BigQuery Basics Project Hierarchy ● Project. All data in BigQuery belongs inside a project ○ Set of users, APIs, authentication, billing information ● Dataset. Holds one or more tables ○ Lowest access control unit (to which ACLs are applied) ● Table. Row-column structure that contains actual data ● Job. Used to start potentially long running queries
  • 15. BigQuery Basics Datasets and Tables Table name is represented as follows: ● Current Project <dataset>.<table name> ● Different Project <project>:<dataset>.<table> e.g. publicdata:samples.wikipedia
  • 16. BigQuery Basics Schema Example ● Demographics about names occurrence table schema name:string,gender:string,count:integer
  • 17. BigQuery Basics Data Types ● ● ● ● ● String ○ UTF-8 encoded, <64kB Integer ○ 64 bit signed Float Boolean ○ "true" or "false", case insensitive Timestamp ○ String format ■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM] ○ Numeric format (seconds from UNIX epoch) ■ 1234567890, 1.234567890123456E9 (*) Max row size: 64kB Date type is supported as timestamp
  • 18. BigQuery Basics Data Format BigQuery supports the following format for loading data: 1. Comma Separated Values (CSV) 2. JSON a. BigQuery can load data faster, embedded newlines. b. Supports nested/repeated data fields if your data con
  • 19. BigQuery Basics Repeated and Nested Fields [ [ Schema example { { "fields": [ "fields": [ { { Loading data with repeated and nested fields is supported by JSON data format only "mode": "mode": "name": "name": "nullable", "nullable", "country", "country", "type": "string" "type": "string" }, }, { { "mode": "nullable", "mode": "nullable", "name": "city", "name": "city", "type": "string" "type": "string" } } ], ], "mode": "repeated", "mode": "repeated", "name": "location", "name": "location", "type": "record" "type": "record" }, }, ........... ...........
  • 20. BigQuery Basics Accessing BigQuery ● BigQuery Web browser ○ Imports/exports data, runs queries ● bq command line tool ○ Performs operations from the command line ● Service API ○ RESTful API to access BigQuery programmatically ○ Requires authorization by OAuth2 ○ Google client libraries for Python, Java, JavaScript, PHP, ... ○
  • 21. BigQuery Basics Third-party Tools ETL tools for loading data into BigQuery Visualization and Business Intelligence
  • 22. BigQuery Basics Example of Visualization Tools Using commercial visualization tools to graph the query results
  • 23. BigQuery Basics Loading Data Using the Web Browser ● ● ● ● Upload from local disk or from Cloud Storage Start the Web browser Select Dataset Create table and follow the wizard steps
  • 24. BigQuery Basics Loading Data Using bq Tool "bq load" command Syntax bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV] destination_table data_source_uri table_schema ● ● ● ● If not specified, the default file format is CSV (comma separated values) The files can also use newline delimited JSON format Schema ○ Either a filename or a comma-separated list of column_name:datatype pairs that describe the file format. Data source may be on local machine or on Cloud Storage
  • 25. BigQuery Basics Load Limitations ● 1,000 import jobs per table per day ● 10,000 import jobs per project per day ● File size (for both CSV and JSON) ○ 1GB for compressed file ○ 1TB for uncompressed ■ 4GB for uncompressed CSV with newlines in strings ● 10,000 files per import job ● 1TB per import job
  • 26. BigQuery Basics A Few Best Practices CSV/JSON must be split into chunks less than 1TB ● "split" command with --line-bytes option ● Split to smaller files ○ Easier error recovery ○ To smaller data unit (day, month instead of year) ● Uploading to Cloud Storage is recommended Cloud Storage BigQuery
  • 27. BigQuery Basics A Few Best Practices ● Split Tables by Dates ○ Minimize cost of data scanned ○ Minimize query time ● Upload Multiple Files to Cloud Storage ○ Allows parallel upload into BigQuery ● Denormalize your data
  • 29. BigQuery Basics Exercise Work through Big Query Exercise 1 -- Basics ● Use the BigQuery UI ● Use the bq command line tool ● Upload a dataset You will query the public sample GSOD (global summary of day) weather dataset. You will get and upload earthquake data.
  • 30. BigQuery Basics Questions ● What are the different ways to load data into BigQuery? ● What is the maximum size of data in a BigQuery table? ● How can we import data into BigQuery? ○ What's the limitation? ○ What formats does BigQuery accept?
  • 31. BigQuery Basics Google I/O Data Sensing ● Start the BigQuery Web browser ● Click on Display Project in the project chooser dialog window ● Enter data-sensing-lab when prompted ● In the dataset data-sensing-lab:io_sensor_data, select the table moscone_io13 ● In the New Query box, enter the following query: SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10 ● Click Run Query button ● Scroll to see relevant results
  • 32. BigQuery Basics Data Structure ● Define table schema when creating table ● Data is stored in per-column structure ● Each column is handled separately and only combined when necessary Advantage of this data structure: ● No need to set index in advance ● Load only the relevant Columns