SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
From Insight to Action - Using
Data Science to Transform
Your Organization
Rob Morrow, Chief Technologist US Government
2© Cloudera, Inc. All rights reserved.
Deploy on any cloud infrastructure
Cloudera Director: Management for IaaS-related and CDH cluster operations
Easy Administration
• Dynamic cluster lifecycle management
• ICD-503 Support
• Single pane of glass: multi-cluster view
• Consumption based billing and metering
Enterprise-grade
• Integration across Cloudera Enterprise
• Management of CDH deployments at
scale
Flexible Deployments
• No cloud vendor lock-in: open plugin
framework for IaaS platforms
• Scaling of provisioned clusters
• Spot instance provisioning
Cloudera Director
3© Cloudera, Inc. All rights reserved.
Enterprise Data Science Topics
It took Todd Lipcon 3 years to
create Kudu;10 years of work
before that learning and gaining
trust among OS Community as a
committer.
Government of the future:
value created through
interesting methods.
If your organization is already
good at the 5,000 Open Source
Algorithms (Regression etc), you
now need a Data Science Cadre.
Open Source: Help Wanted. Methods, not raw DataMost problems are not really
Data Science “Challenges”
4© Cloudera, Inc. All rights reserved.
Data Engineering and Data Science Workloads
Data Ingestion
(Kafka, Navigator,
Search)
Cloudera enables users to build real-time, end-
to-end data pipelines in order to power their
business. Leadership in Apache Spark and
Kafka have made Cloudera a trusted resource
for users who want to capture real-time,
streaming, and time series data without being
presented with gaps in security.
Data Processing
(Spark, Hive)
Cloudera is helping users accelerate
their data pipelines with leadership in
technologies like Apache
Spark. Data processing in Cloudera
Enterprise can help take processing
windows from hours to minutes and
enables faster access to data for a
variety of users and skillsets.
Data Science (Spark
MLlib)
Cloudera is bringing the most popular data
science languages/libraries to our platform
for easier collaboration, self-service
exploration, and implementation at
scale. Cloudera is advancing the state of
distributed machine learning at scale.
Cloudera enables exploratory data science
and the ability to deliver robust data
products.
5© Cloudera, Inc. All rights reserved.
Closing Gaps in Critical Skills Areas in the Govt
Data Science
High Value, Low Frequency
• Only a small set of problems require
direct Data Science expertise (~5%)
• Domain-general, algorithm-specific
• Very high expertise
Characterized by
• Spark/Python Expertise
• Advanced Algorithms
• Hypothesis-testing
Automation/Workload
• Per-task/Algorithm automation
Data Analysis
High Frequency, Self-Service
• The “other” 95% of Problems
• More domain-specific
Characterized by
• Tools with UI’s (Data Robot)
• “Exploratory” data investigation
Automation/Workload
• Easily automated
Data Science “Unicorns” are even more valuable in the Govt.
So how to you scale them out?
6© Cloudera, Inc. All rights reserved.
Two Data Science Use Cases
Improving decisions vs. improving products
Decision Science
(improving business decisions)
Data Products
(improving products for customers)
• User: Data scientists and analysts
• Data: New and changing; often sampled
• Environment: Local machine, sandbox cluster
• Tools: R, Python, SAS/SPSS, SQL; notebooks; data
wrangling/discovery tools, …
• Goal: Understand data, develop and improve models,
share results
• Production: Hosted/scheduled reports or dashboards
• User: Data engineers, developers, SREs
• Data: Known data; full scale
• Environment: Production clusters
• Tools: Java/Scala, C++; IDEs; continuous
integration, source control, …
• Goal: Build and maintain applications, improve
model performance, manage models in production
• Production: Online applications
7© Cloudera, Inc. All rights reserved.
Ingest
The Foundation of Hadoop’s Potential
Data can come from a variety of “siloed” sources
▪ Existing databases
▪ Sensor data
▪ Server logs
▪ Chat transcripts
Value of data is multiplied when combined and
correlated with other data
▪ “40% value improvement from combining data from
multiple IoT sources” McKinsey Global Institute
8© Cloudera, Inc. All rights reserved.
Data Processing
Leverage the right processing for your job
Data may require unique processing characteristics
▪ Batch
▪ Streaming
▪ Real-time
Hadoop arose to address one and now the ecosystem
has evolved to answer the rest.
▪ “We’re doubling down on Spark. We invested earliest,
and we’ve invested most, in making Hadoop
enterprise-grade” Mike Olson
9© Cloudera, Inc. All rights reserved.
Data Science
A Unified Platform to Accelerate Data Science from Exploration to Production.
Data Scientists need to use data to…
▪ Explore
▪ Model
▪ Test
The field of data science blends math and statistics
knowledge with advanced computer knowledge.
▪ “Data Scientist: Person who is better at statistics than
any software engineer and better at software
engineering than any statistician” Josh Wills
10© Cloudera, Inc. All rights reserved.
MLlib
Collection of mainstream machine learning algorithms built on Spark
Including:
•Classifiers: logistic regression, boosted trees, random forests, etc
•Clustering: k-means, Latent Dirichlet Allocation (LDA)
•Recommender Systems: Alternating Least Squares
•Dimensionality Reduction: Principal Component Analysis (PCA) and Singular Value
Decomposition (SVD)
•Feature Engineering & Selection: TF-IDF, Word2Vec, Normalizer, etc
•Statistical Functions: Chi-Squared Test, Pearson Correlation, etc
11© Cloudera, Inc. All rights reserved.
Data Science Track Info
Data Science Location: Severn
Matrix Decomposition at Scale
Juliet Hougland, Data Scientist, Cloudera
Large-scale Agent-Based Modeling and Simulation on High-Performance Computers
Dr. Robert Axtell, George Mason University
Random Decision Forests at Scale
Todd Boetticher, Solutions Consultant, Cloudera
12© Cloudera, Inc. All rights reserved.
1
Recommended Training for Data Engineering
Learn how to identify which tool
is the right one to use in a given
situation, and gain hands-on
experience using those tools
Cloudera University’s three-day
course helps participants
understand what data scientists
do, the problems they solve,
and the tools and techniques
they use
Learn how to increase the ROI
from big data investments, by
delivering faster time to insight
for your organization.
Apache Spark and Hadoop Data Science on Hadoop Cloudera Search
13© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
PPTX
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PPTX
Building a Modern Analytic Database with Cloudera 5.8
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
PPTX
Secure Data - Why Encryption and Access Control are Game Changers
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Turning Data into Business Value with a Modern Data Platform
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Using Big Data to Transform Your Customer’s Experience - Part 1

Building a Modern Analytic Database with Cloudera 5.8
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Secure Data - Why Encryption and Access Control are Game Changers

What's hot (20)

PPTX
Becoming Data-Driven Through Cultural Change
PPTX
Engaging with Cloudera & Morning Wrap Up
PPTX
Random Decision Forests at Scale
PPTX
Protecting health and life science organizations from breaches and ransomware
PPTX
Put Alternative Data to Use in Capital Markets

PPTX
Preparing for the Cybersecurity Renaissance
PPTX
Moving Beyond Lambda Architectures with Apache Kudu
PPTX
Advanced Analytics for Investment Firms and Machine Learning
PPTX
Driving Better Products with Customer Intelligence

PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PPTX
Data Drive Applications_Webinar
PPTX
The Five Markers on Your Big Data Journey
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PPTX
Big Data Fundamentals
PPTX
Optimizing Regulatory Compliance with Big Data
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PPTX
The Transformation of your Data in modern IT (Presented by DellEMC)
PPTX
Transforming Insurance Analytics with Big Data and Automated Machine Learning

PPTX
Govern This! Data Discovery and the application of data governance with new s...
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Becoming Data-Driven Through Cultural Change
Engaging with Cloudera & Morning Wrap Up
Random Decision Forests at Scale
Protecting health and life science organizations from breaches and ransomware
Put Alternative Data to Use in Capital Markets

Preparing for the Cybersecurity Renaissance
Moving Beyond Lambda Architectures with Apache Kudu
Advanced Analytics for Investment Firms and Machine Learning
Driving Better Products with Customer Intelligence

Enterprise Data Hub: The Next Big Thing in Big Data
Data Drive Applications_Webinar
The Five Markers on Your Big Data Journey
High-Performance Analytics in the Cloud with Apache Impala
Big Data Fundamentals
Optimizing Regulatory Compliance with Big Data
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
The Transformation of your Data in modern IT (Presented by DellEMC)
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Govern This! Data Discovery and the application of data governance with new s...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Ad

Viewers also liked (19)

PPTX
Complex Models for Big Data
PDF
Building new business models through big data dec 06 2012
PDF
Data Science Highlights
PPTX
Engineering patterns for implementing data science models on big data platforms
PPTX
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
PPT
From insight to action - data analysis that makes a difference! - Heena Jethwa
PDF
Becoming a Data Driven Organisation
PPTX
Automated Regulatory Compliance Management
PPSX
5 Essential Practices of the Data Driven Organization
PDF
Data-Driven Organisation
PDF
[Ai in finance] AI in regulatory compliance, risk management, and auditing
PPTX
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
PPTX
Cloudera for Internet of Things
PPT
SAP’s Utilities Roadmap Overview, The Evolution of Regulatory Compliance and ...
PDF
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
PPTX
How to create new business models with Big Data and Analytics
PDF
Data science apps: beyond notebooks
PPTX
Accenture Regulatory Compliance Platform
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Complex Models for Big Data
Building new business models through big data dec 06 2012
Data Science Highlights
Engineering patterns for implementing data science models on big data platforms
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
From insight to action - data analysis that makes a difference! - Heena Jethwa
Becoming a Data Driven Organisation
Automated Regulatory Compliance Management
5 Essential Practices of the Data Driven Organization
Data-Driven Organisation
[Ai in finance] AI in regulatory compliance, risk management, and auditing
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
Cloudera for Internet of Things
SAP’s Utilities Roadmap Overview, The Evolution of Regulatory Compliance and ...
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
How to create new business models with Big Data and Analytics
Data science apps: beyond notebooks
Accenture Regulatory Compliance Platform
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Ad

Similar to From Insight to Action: Using Data Science to Transform Your Organization (20)

PPTX
Large-Scale Data Science on Hadoop (Intel Big Data Day)
PPTX
Data Science at Scale Using Apache Spark and Apache Hadoop
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PPTX
Data Science and CDSW
PPTX
Part 1: Introducing the Cloudera Data Science Workbench
PDF
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PDF
Machine Learning in the Enterprise 2019
PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
PPTX
Data Science in Enterprise
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
PDF
Data Science in the Enterprise
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PPTX
Analyzing Hadoop Data Using Sparklyr

PPTX
Part 3: Models in Production: A Look From Beginning to End
PPTX
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Data Science at Scale Using Apache Spark and Apache Hadoop
Introducing the data science sandbox as a service 8.30.18
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Data Science and CDSW
Part 1: Introducing the Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Machine Learning in the Enterprise 2019
Unlocking data science in the enterprise - with Oracle and Cloudera
Data Science in Enterprise
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Data Science in the Enterprise
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Analyzing Hadoop Data Using Sparklyr

Part 3: Models in Production: A Look From Beginning to End
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Intel and Cloudera: Accelerating Enterprise Big Data Success

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Cloudera SDX
PPTX
Introducing Workload XM 8.7.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Cloudera SDX
Introducing Workload XM 8.7.18

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Introduction to Artificial Intelligence
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
top salesforce developer skills in 2025.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
assetexplorer- product-overview - presentation
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
System and Network Administration Chapter 2
Odoo Companies in India – Driving Business Transformation.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Introduction to Artificial Intelligence
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Operating system designcfffgfgggggggvggggggggg
Reimagine Home Health with the Power of Agentic AI​
How to Choose the Right IT Partner for Your Business in Malaysia
top salesforce developer skills in 2025.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
assetexplorer- product-overview - presentation
Softaken Excel to vCard Converter Software.pdf
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
CHAPTER 2 - PM Management and IT Context
System and Network Administration Chapter 2

From Insight to Action: Using Data Science to Transform Your Organization

  • 1. 1© Cloudera, Inc. All rights reserved. From Insight to Action - Using Data Science to Transform Your Organization Rob Morrow, Chief Technologist US Government
  • 2. 2© Cloudera, Inc. All rights reserved. Deploy on any cloud infrastructure Cloudera Director: Management for IaaS-related and CDH cluster operations Easy Administration • Dynamic cluster lifecycle management • ICD-503 Support • Single pane of glass: multi-cluster view • Consumption based billing and metering Enterprise-grade • Integration across Cloudera Enterprise • Management of CDH deployments at scale Flexible Deployments • No cloud vendor lock-in: open plugin framework for IaaS platforms • Scaling of provisioned clusters • Spot instance provisioning Cloudera Director
  • 3. 3© Cloudera, Inc. All rights reserved. Enterprise Data Science Topics It took Todd Lipcon 3 years to create Kudu;10 years of work before that learning and gaining trust among OS Community as a committer. Government of the future: value created through interesting methods. If your organization is already good at the 5,000 Open Source Algorithms (Regression etc), you now need a Data Science Cadre. Open Source: Help Wanted. Methods, not raw DataMost problems are not really Data Science “Challenges”
  • 4. 4© Cloudera, Inc. All rights reserved. Data Engineering and Data Science Workloads Data Ingestion (Kafka, Navigator, Search) Cloudera enables users to build real-time, end- to-end data pipelines in order to power their business. Leadership in Apache Spark and Kafka have made Cloudera a trusted resource for users who want to capture real-time, streaming, and time series data without being presented with gaps in security. Data Processing (Spark, Hive) Cloudera is helping users accelerate their data pipelines with leadership in technologies like Apache Spark. Data processing in Cloudera Enterprise can help take processing windows from hours to minutes and enables faster access to data for a variety of users and skillsets. Data Science (Spark MLlib) Cloudera is bringing the most popular data science languages/libraries to our platform for easier collaboration, self-service exploration, and implementation at scale. Cloudera is advancing the state of distributed machine learning at scale. Cloudera enables exploratory data science and the ability to deliver robust data products.
  • 5. 5© Cloudera, Inc. All rights reserved. Closing Gaps in Critical Skills Areas in the Govt Data Science High Value, Low Frequency • Only a small set of problems require direct Data Science expertise (~5%) • Domain-general, algorithm-specific • Very high expertise Characterized by • Spark/Python Expertise • Advanced Algorithms • Hypothesis-testing Automation/Workload • Per-task/Algorithm automation Data Analysis High Frequency, Self-Service • The “other” 95% of Problems • More domain-specific Characterized by • Tools with UI’s (Data Robot) • “Exploratory” data investigation Automation/Workload • Easily automated Data Science “Unicorns” are even more valuable in the Govt. So how to you scale them out?
  • 6. 6© Cloudera, Inc. All rights reserved. Two Data Science Use Cases Improving decisions vs. improving products Decision Science (improving business decisions) Data Products (improving products for customers) • User: Data scientists and analysts • Data: New and changing; often sampled • Environment: Local machine, sandbox cluster • Tools: R, Python, SAS/SPSS, SQL; notebooks; data wrangling/discovery tools, … • Goal: Understand data, develop and improve models, share results • Production: Hosted/scheduled reports or dashboards • User: Data engineers, developers, SREs • Data: Known data; full scale • Environment: Production clusters • Tools: Java/Scala, C++; IDEs; continuous integration, source control, … • Goal: Build and maintain applications, improve model performance, manage models in production • Production: Online applications
  • 7. 7© Cloudera, Inc. All rights reserved. Ingest The Foundation of Hadoop’s Potential Data can come from a variety of “siloed” sources ▪ Existing databases ▪ Sensor data ▪ Server logs ▪ Chat transcripts Value of data is multiplied when combined and correlated with other data ▪ “40% value improvement from combining data from multiple IoT sources” McKinsey Global Institute
  • 8. 8© Cloudera, Inc. All rights reserved. Data Processing Leverage the right processing for your job Data may require unique processing characteristics ▪ Batch ▪ Streaming ▪ Real-time Hadoop arose to address one and now the ecosystem has evolved to answer the rest. ▪ “We’re doubling down on Spark. We invested earliest, and we’ve invested most, in making Hadoop enterprise-grade” Mike Olson
  • 9. 9© Cloudera, Inc. All rights reserved. Data Science A Unified Platform to Accelerate Data Science from Exploration to Production. Data Scientists need to use data to… ▪ Explore ▪ Model ▪ Test The field of data science blends math and statistics knowledge with advanced computer knowledge. ▪ “Data Scientist: Person who is better at statistics than any software engineer and better at software engineering than any statistician” Josh Wills
  • 10. 10© Cloudera, Inc. All rights reserved. MLlib Collection of mainstream machine learning algorithms built on Spark Including: •Classifiers: logistic regression, boosted trees, random forests, etc •Clustering: k-means, Latent Dirichlet Allocation (LDA) •Recommender Systems: Alternating Least Squares •Dimensionality Reduction: Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) •Feature Engineering & Selection: TF-IDF, Word2Vec, Normalizer, etc •Statistical Functions: Chi-Squared Test, Pearson Correlation, etc
  • 11. 11© Cloudera, Inc. All rights reserved. Data Science Track Info Data Science Location: Severn Matrix Decomposition at Scale Juliet Hougland, Data Scientist, Cloudera Large-scale Agent-Based Modeling and Simulation on High-Performance Computers Dr. Robert Axtell, George Mason University Random Decision Forests at Scale Todd Boetticher, Solutions Consultant, Cloudera
  • 12. 12© Cloudera, Inc. All rights reserved. 1 Recommended Training for Data Engineering Learn how to identify which tool is the right one to use in a given situation, and gain hands-on experience using those tools Cloudera University’s three-day course helps participants understand what data scientists do, the problems they solve, and the tools and techniques they use Learn how to increase the ROI from big data investments, by delivering faster time to insight for your organization. Apache Spark and Hadoop Data Science on Hadoop Cloudera Search
  • 13. 13© Cloudera, Inc. All rights reserved. Thank you