SlideShare a Scribd company logo
Agile Deployment of
                                    Predictive Analytics on
                                                   Hadoop

         Faster Insights through Open Standards
                                                           Hadoop Summit 2012



     © 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved.        Page 1
Today s Session

                      Ulrich Rueckert                                      Michael Zeller
                      Data Scientist                                       CEO
                      Datameer                                             Zementis



    After this session, you will be able to…

    1.  Effectively deliver predictive solutions combining:
             a.  R, KNIME & Others               [Model Development]
             b.  Zementis Universal PMML Plug-in [Model Deployment & Execution]
             c.  Datameer                        [Scalable Hadoop Infrastructure]

    2.  Identify PMML as a vendor-neutral & open standard to:
             a.  Incorporate predictive models from virtually any commercial vendor or open source tool
             b.  Apply such models on Big Data

    3.  Leverage a lightweight, agile deployment process for predictive analytics to:
             a.  Accelerate time-to-market
             b.  Lower cost and complexity
             c.  Reuse existing predictive assets

© 2012 Datameer, Inc. All rights reserved.          Page 2
Who is Datameer?

     §  “Business Intelligence on top of Hadoop”
     §  Established 2009 by Hadoop and enterprise software veterans
     §  Offices in Silicon Valley, New York and Germany




     §  Some customers:




© 2012 Datameer, Inc. All rights reserved.   Page 3
Who is Zementis?

     §  Focus on Operational Predictive Analytics
     §  Offices in San Diego and Hong Kong
     §  Predictive Analytics Software Technology:
              •    ADAPA® Decision Engine (Predictive Models and Rules)
              •    ADAPA Add-in for Excel
              •    PMML Converter
              •    Universal PMML Plug-in (UPPI)


     §  Global Partner Network




© 2012 Datameer, Inc. All rights reserved.      Page 4
Big Data and Analytics


        §  People and Sensor Data
             •  Transaction records
             •  Social media
             •  Climate information                   90% of the data today
                                                      created in the last 2 years
             •  Mobile GPS signals
             •  Healthcare
             •  Smart Grid

        §  Benefits from Analytics
             •  Descriptive Analytics answers What happened?
             •  Predictive Analytics answers What will happen next?


© 2012 Datameer, Inc. All rights reserved.   Page 5
Operational Predictive Analytics

                                                                                                               Score Distribution
                                                                                                         1st Lien Stand-Alone Loans

                                                                    14%                              Goods
                                                                                                     Bads
                                                                    12%
                                                                                                     Poly. (Goods)
                                                                                                     Poly. (Bads)
                                                   % Within Class




                                                                    10%

                                                                    8%

                                                                    6%

                                                                    4%

                                                                    2%

                                                                    0%
                                                                           50

                                                                                100

                                                                                      150

                                                                                            200

                                                                                                   250

                                                                                                          300

                                                                                                                350

                                                                                                                      400

                                                                                                                            450

                                                                                                                                  500

                                                                                                                                        550

                                                                                                                                              600

                                                                                                                                                     650

                                                                                                                                                           700

                                                                                                                                                                 750

                                                                                                                                                                       800

                                                                                                                                                                             850

                                                                                                                                                                                   900

                                                                                                                                                                                         950

                                                                                                                                                                                               1000
                                                      % of Delinquent Loans per Month
                                                                                                                              Score
                                      90

                                      80
              % of Delinquent Loans




                                      70
                                                                                                                                               700
                                      60
                                                                                                                                               750
                                      50                                                                                                       800
                                      40                                                                                                       850
                                                                                                                                               900
                                      30
                                                                                                                                               950
                                      20

                                      10

                                      0
                                       Jan   Feb      Mar            Apr    May       Jun    Jul         Aug    Sep     Oct       Nov

                                                                                  Months




© 2012 Datameer, Inc. All rights reserved.                                                                                                    Page 6
From Model Building to Deployment

              Model Building                                     Model Deployment
                                                               Integration / Execution



                                                                      Datameer Server
                                                               	
  
                                                               	
          PMML	
  
                                                                            PMML	
  
                                                                             PMML	
  
                                                                          (models)	
  
                                                               	
          (models)	
  
                                                                            (models)	
  
                                             PMML
                                                	
             	
  
                                                               	
  
                                                               	
           UPPI	
  
                                                               	
  
                                                               	
  


                                                          Simple Deployment & Execution
                                                          1.  Upload PMML file(s) in DAS
                                                          2.  PMML turns into custom function
                                                          3.  Seamlessly score data in Datameer

© 2012 Datameer, Inc. All rights reserved.       Page 7
PMML
Predictive Model Markup Language



                                             •  PMML is an XML-based language used to define
                                             statistical and data mining models and to share these
                                             between compliant applications.

                                             •  Mature standard developed by the DMG (Data Mining
                                             Group) to avoid proprietary issues and incompatibilities
                                             and to deploy models.
 Transformations
                                             •  Supported by all leading data mining tools, commercial
                                             and open-source.

                                             •  Allows for the clear separation of tasks: Model
                                             development vs. model deployment.

                                             •  Eliminates the need for custom code and proprietary
      PMML book available on                 model deployment solutions.
          Amazon.com
                                             •  Uniform deployment platform ensures scalability and
                                             reliability of model execution.
© 2012 Datameer, Inc. All rights reserved.        Page 8
PMML: Predictive Model Management
  Integrating across all systems and processes



            Business Process




                                             PMML


                                                      IBM SmartCloud
         Applications                                 Amazon EC2
         CRM, ERP, EXCEL, etc.


© 2012 Datameer, Inc. All rights reserved.   Page 9
PMML: One Standard, One Process


                                                  Divisions



      Service Providers
                                                                 External Vendors




                                                       PMML




                                             Applications
© 2012 Datameer, Inc. All rights reserved.             Page 10
Demo Setup

    §  End-to-end Model Development Lifecycle
    §  PMML Standard as the Glue

Real-time Process
                                                                                                Understand
Improvement and ROI                             Model
                                                                                Data Analysis   Client s Data
                                              Deployment




                                                     Universal	
  
                                                      PMML	
  	
  
                                                      Plug-­‐In	
  


                                              Development
Demonstrate                                                                     Model Design    Build Model(s) to
                                                and Test
Model Performance                                                                               Unlock Hidden Value


 © 2012 Datameer, Inc. All rights reserved.                           Page 11
Demo: Annual Marketing Campaign

   §  Which customers should we
       target?                                                 2011                    2012
                                                             Campaign                Customer
   §  Split 2011 results in training                         Results                   List


       and test set
   §  Learn model on training set                                      Subset for
                                                                         Testing

   §  Apply model on test set                                                       Fine-Tuned
                                                                                      Prediction
                                                                                        Model
   §  Fine-tune model until                           Subset for       Prediction

       evaluation shows success                         Training          Model



   §  Apply final model on 2012
       customer list                                                      Model
                                                                        Evaluation
                                                                                     Campaign
                                                                                     Candidates




© 2012 Datameer, Inc. All rights reserved.   Page 12
Summary


•      Open Standards vs.                    •    Minimize Data Movement         •    Leverage Datameer UI
       Proprietary Code                      •    Massively Parallel Execution   •    Deploy in Minutes vs. Months
•      Best-of-Breed Tool Set                •    Scale with Business Demand     •    No Coding Skills Required




      Avoid Vendor                                                                     Ease of Use
        Lock-in                                    Hadoop-based                         Fast ROI
                                                  Scoring Paradigm
© 2012 Datameer, Inc. All rights reserved.                 Page 13
Online Resources




 §  Learn More About PMML
 §     Data Mining Group website                                 http://www.dmg.org
 §     Join LinkedIn PMML Discussion Group                       http://www.linkedin.com/groupRegistration?gid=2328634
 §     Articles, on-line videos, blogs                           http://www.zementis.com/community.htm



 §  Product Info
 §     On Demand Webinar                    http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/

 §     UPPI for Datameer                    http://www.zementis.com/DAS-plugin.htm



© 2012 Datameer, Inc. All rights reserved.                  Page 14

More Related Content

PDF
Impact of Agricultural Research in Sub-Saharan Africa
PDF
Sl&Et Automation Eng
PDF
EOLE / OWF 12 - USA practices in m&a-l. philip odence (eole2012)
PPTX
Patient portal experience personal version
PDF
Mail Today September 13, 2009
PPTX
ZENTIVA share price history
PPTX
Social media in Belarus
PDF
Maximiliano Martinhao - Rules and Procedures Related to Certification in Brazil
Impact of Agricultural Research in Sub-Saharan Africa
Sl&Et Automation Eng
EOLE / OWF 12 - USA practices in m&a-l. philip odence (eole2012)
Patient portal experience personal version
Mail Today September 13, 2009
ZENTIVA share price history
Social media in Belarus
Maximiliano Martinhao - Rules and Procedures Related to Certification in Brazil

Viewers also liked (20)

PDF
Pattern: PMML for Cascading and Hadoop
PDF
Deploying Data Science with Docker and AWS
PDF
A Short PMML Tutorial by LatentView
PPT
PMML - Predictive Model Markup Language
PDF
PMML Execution of R Built Predictive Solutions
PDF
Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0
PPT
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
PDF
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
PDF
Docker @ Data Science Meetup
PDF
Using python and docker for data science
PDF
Docker for data science
PPT
Reproducible bioinformatics pipelines with Docker and Anduril
PDF
Geber Consulting - Big Data in Healthcare
PPT
IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...
PDF
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
PDF
Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)
PDF
Healthcare Analytics Maturity Model
PDF
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
PDF
Big Data Analytics in Healthcare and Life Sciences
PDF
Predicting Hospital Readmission Using Cascading
Pattern: PMML for Cascading and Hadoop
Deploying Data Science with Docker and AWS
A Short PMML Tutorial by LatentView
PMML - Predictive Model Markup Language
PMML Execution of R Built Predictive Solutions
Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Docker @ Data Science Meetup
Using python and docker for data science
Docker for data science
Reproducible bioinformatics pipelines with Docker and Anduril
Geber Consulting - Big Data in Healthcare
IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)
Healthcare Analytics Maturity Model
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
Big Data Analytics in Healthcare and Life Sciences
Predicting Hospital Readmission Using Cascading
Ad

Similar to Agile deployment predictive analytics on hadoop (20)

PPTX
Managing a Website Performance Optimization (WPO) Project
PDF
Bango WiFi Market Data 1 Q09
PDF
E bootcamp right track manufacturing solutions 3 min pitch 04172013
PPTX
"So – are we getting better?”
PDF
Business Services as a Resource to Business - Kristina Harrell
PDF
Embraer Day NY 2011 - Defense and Security
PDF
Embraer day 2011_ny_ds(1)
PDF
Smaato - NOAH12 San Francisco
PDF
3 things to start this afternoon to improve your paid search
PDF
The DevOps PaaS Infusion - May meetup
PDF
Cloudify summit2012 pub
PDF
Driving a High Performance Culture
PPTX
Ashnik corporate presentation Dec 2012
PDF
Sapphire Online 2009 Or1005
PDF
How To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics Cloud
PDF
metlife Investor Day 2008 Investments
PDF
New IDC Research on Software Analysis & Measurement
PDF
Measuring interaction in digital publications
PPTX
Solarmer Energy Profile
PDF
The dark side of IoT
Managing a Website Performance Optimization (WPO) Project
Bango WiFi Market Data 1 Q09
E bootcamp right track manufacturing solutions 3 min pitch 04172013
"So – are we getting better?”
Business Services as a Resource to Business - Kristina Harrell
Embraer Day NY 2011 - Defense and Security
Embraer day 2011_ny_ds(1)
Smaato - NOAH12 San Francisco
3 things to start this afternoon to improve your paid search
The DevOps PaaS Infusion - May meetup
Cloudify summit2012 pub
Driving a High Performance Culture
Ashnik corporate presentation Dec 2012
Sapphire Online 2009 Or1005
How To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics Cloud
metlife Investor Day 2008 Investments
New IDC Research on Software Analysis & Measurement
Measuring interaction in digital publications
Solarmer Energy Profile
The dark side of IoT
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Agile deployment predictive analytics on hadoop

  • 1. Agile Deployment of Predictive Analytics on Hadoop Faster Insights through Open Standards Hadoop Summit 2012 © 2012 Datameer, Inc. All rights reserved. © 2012 Datameer, Inc. All rights reserved. Page 1
  • 2. Today s Session Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis After this session, you will be able to… 1.  Effectively deliver predictive solutions combining: a.  R, KNIME & Others [Model Development] b.  Zementis Universal PMML Plug-in [Model Deployment & Execution] c.  Datameer [Scalable Hadoop Infrastructure] 2.  Identify PMML as a vendor-neutral & open standard to: a.  Incorporate predictive models from virtually any commercial vendor or open source tool b.  Apply such models on Big Data 3.  Leverage a lightweight, agile deployment process for predictive analytics to: a.  Accelerate time-to-market b.  Lower cost and complexity c.  Reuse existing predictive assets © 2012 Datameer, Inc. All rights reserved. Page 2
  • 3. Who is Datameer? §  “Business Intelligence on top of Hadoop” §  Established 2009 by Hadoop and enterprise software veterans §  Offices in Silicon Valley, New York and Germany §  Some customers: © 2012 Datameer, Inc. All rights reserved. Page 3
  • 4. Who is Zementis? §  Focus on Operational Predictive Analytics §  Offices in San Diego and Hong Kong §  Predictive Analytics Software Technology: •  ADAPA® Decision Engine (Predictive Models and Rules) •  ADAPA Add-in for Excel •  PMML Converter •  Universal PMML Plug-in (UPPI) §  Global Partner Network © 2012 Datameer, Inc. All rights reserved. Page 4
  • 5. Big Data and Analytics §  People and Sensor Data •  Transaction records •  Social media •  Climate information 90% of the data today created in the last 2 years •  Mobile GPS signals •  Healthcare •  Smart Grid §  Benefits from Analytics •  Descriptive Analytics answers What happened? •  Predictive Analytics answers What will happen next? © 2012 Datameer, Inc. All rights reserved. Page 5
  • 6. Operational Predictive Analytics Score Distribution 1st Lien Stand-Alone Loans 14% Goods Bads 12% Poly. (Goods) Poly. (Bads) % Within Class 10% 8% 6% 4% 2% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 % of Delinquent Loans per Month Score 90 80 % of Delinquent Loans 70 700 60 750 50 800 40 850 900 30 950 20 10 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Months © 2012 Datameer, Inc. All rights reserved. Page 6
  • 7. From Model Building to Deployment Model Building Model Deployment Integration / Execution Datameer Server     PMML   PMML   PMML   (models)     (models)   (models)   PMML         UPPI       Simple Deployment & Execution 1.  Upload PMML file(s) in DAS 2.  PMML turns into custom function 3.  Seamlessly score data in Datameer © 2012 Datameer, Inc. All rights reserved. Page 7
  • 8. PMML Predictive Model Markup Language •  PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. •  Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. Transformations •  Supported by all leading data mining tools, commercial and open-source. •  Allows for the clear separation of tasks: Model development vs. model deployment. •  Eliminates the need for custom code and proprietary PMML book available on model deployment solutions. Amazon.com •  Uniform deployment platform ensures scalability and reliability of model execution. © 2012 Datameer, Inc. All rights reserved. Page 8
  • 9. PMML: Predictive Model Management Integrating across all systems and processes Business Process PMML IBM SmartCloud Applications Amazon EC2 CRM, ERP, EXCEL, etc. © 2012 Datameer, Inc. All rights reserved. Page 9
  • 10. PMML: One Standard, One Process Divisions Service Providers External Vendors PMML Applications © 2012 Datameer, Inc. All rights reserved. Page 10
  • 11. Demo Setup §  End-to-end Model Development Lifecycle §  PMML Standard as the Glue Real-time Process Understand Improvement and ROI Model Data Analysis Client s Data Deployment Universal   PMML     Plug-­‐In   Development Demonstrate Model Design Build Model(s) to and Test Model Performance Unlock Hidden Value © 2012 Datameer, Inc. All rights reserved. Page 11
  • 12. Demo: Annual Marketing Campaign §  Which customers should we target? 2011 2012 Campaign Customer §  Split 2011 results in training Results List and test set §  Learn model on training set Subset for Testing §  Apply model on test set Fine-Tuned Prediction Model §  Fine-tune model until Subset for Prediction evaluation shows success Training Model §  Apply final model on 2012 customer list Model Evaluation Campaign Candidates © 2012 Datameer, Inc. All rights reserved. Page 12
  • 13. Summary •  Open Standards vs. •  Minimize Data Movement •  Leverage Datameer UI Proprietary Code •  Massively Parallel Execution •  Deploy in Minutes vs. Months •  Best-of-Breed Tool Set •  Scale with Business Demand •  No Coding Skills Required Avoid Vendor Ease of Use Lock-in Hadoop-based Fast ROI Scoring Paradigm © 2012 Datameer, Inc. All rights reserved. Page 13
  • 14. Online Resources §  Learn More About PMML §  Data Mining Group website http://www.dmg.org §  Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634 §  Articles, on-line videos, blogs http://www.zementis.com/community.htm §  Product Info §  On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/ §  UPPI for Datameer http://www.zementis.com/DAS-plugin.htm © 2012 Datameer, Inc. All rights reserved. Page 14