SlideShare a Scribd company logo
Big Data Use
DevNexus Conference
2/18/2013

*Fully buzzword-compliant title

1

*
Cases
whoami
•

Brad Anderson

•

Solutions Architect at MapR (Atlanta)

•

ATLHUG co-chair

•

NoSQL East Conference 2009

•

“boorad” most places (twitter, github)

•

banderson@maprtech.com
2
Mobile

Virtualization

Social
Media

B2B

Application Service Provider

Cloud
Client/Server
Web 2.0

Service Bureau

Software-as-a-Service
3
BIG DATA
4
5
Business Value
6
Business Value
7
Big Data is not new!
but the tools are.

8
Ship the Function to the Data
Distributed Computing

Traditional Architecture
function

function

data

data

function

data

data

function

function

data

data

function

data

RDBMS

function

data

data

data

data

data

data

data

data

function

function

function

data

data

data

data

data

data

data

data

data

function

function

function

data

data

data

SAN/NAS

9
Variation: Multiple MapReduces
Example: Fraud Detection in User Transactions
MapReduce

Transaction
data

LDA training
LDA scoring

G2 score

95 %-ile LDA anomaly

HBase /
MapR M7 Edition

Candidate events
for analyst review
http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
10
MapR Distribution for Apache Hadoop


Complete Hadoop
distribution



Comprehensive
management suite



Industry-standard
interfaces



Enterprise-grade
dependability



Higher performance
11
Big Data Ecosystem

12
Use Case
Company
 Data Source(s)
 Technique(s)
 Business Value


13
Proactive Monitoring
14
Data Sources

Server Telemetry
 Monitoring Logs
 Network Flow


15
Techniques

Pattern Recognition
 Proactive Monitoring
 Early Alert Delivery


16
Business Value

17
Telecommunications Giant

ETL Offload
18
Telecommunications

Data Sources

Customer Records
 Contract Data
 Purchase Orders
 Call Center


19
Telecommunications

Techniques
Analytics

ETL

20
Telecommunications

Techniques

+
ETL (Hadoop)

Analytics (Teradata)
21
Telecommunications

Business Value

22
Credit Card
Issuer

Data Sources

Customer Purchase History
 Merchant Designations
 Merchant Special Offers


23
Credit Card
Issuer

Techniques
Hadoop
Purchase
History

Export
(4 hrs)

App
App

Merchant
Information

Recommendation
Engine Results
(Mahout)

Presentation
Data Store
(DB2)

App
App

Merchant
Offers

App

Import
(4 hrs)
24
Credit Card
Issuer

Techniques
Hadoop
Purchase
History
Merchant
Information

Recommendation
Engine Results
(Mahout)

Index
Update
(2 min)

App
App

Recommendation
Search Index
(Solr)

App
App

Merchant
Offers

App

25
Credit Card
Issuer

Business Value

26
Waste & Recycling Leader

Idle Alerts
27
Data Sources


Truck Geolocation Data

20,000 trucks
– 5 sec interval
–



Landfill Geographic Boundaries
28
Techniques
Realtime Stream Computation
(Storm)

Truck
Geolocation

Data

Hadoop
Storage

Immediate
Alerts

Batch Computation
(MapReduce)

Tax Reduction
Reporting

Shortest Path
Graph Algorithm

Route
Optimization

29
Business Value

30
Fraud Detection
Data Lake
31
Data Sources



Anti-Money Laundering
Consumer Transactions

32
Techniques
Anti-Money Laundering
System

Consumer Transactions
System

33
Techniques
AML
Data Lake
(Hadoop)

Suspicious
Events

Consumer
Transactions

Analyst
Latent Dirichlet Allocation,
Bayesian Learning Neural Network,
Peer Group Analysis
34
Business Value

35
Machine Learning
Search Relevance
DNA Matching
36
Data Sources

Birth, Death, Census, Military, I
mmigration records
 Search Behavior Activity
 DNA SNP (snips)


37
Techniques
Record Linking
 Search Relevance
 Clickstream Behavior
 Security Forensics
 DNA Matching


38
Business Value

39
Traffic Analytics
40
Data Sources


Inrix Road Segment Data

Avg Speed / minute / segment
– Reference Speeds
–



Road Segment Geolocation Data
41
Techniques
 Bottleneck Detection Algorithm
 Time Offset Correlations
–



Alternate Routes

Predictive Congestion Analysis

–

Growth & Term Assumptions
42
43
44
Business Value

45
Similar Characteristics
Lots of Data
 Structured, Semi-Structured, Unstructured
 Varied Systems Interoperating
– Hadoop, Storm, Solr, MPP, Visualizations


Increase Revenue
 Decrease Costs


46
Thank You

47

More Related Content

PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Владимир Слободянюк «DWH & BigData – architecture approaches»
PPT
Big Data: An Overview
PDF
Big Data Real Time Applications
PDF
Big Data Architecture
PDF
Big Data Analytics for Real Time Systems
PPT
My other computer is a datacentre - 2012 edition
PDF
Big Data Architecture and Design Patterns
Big Data Analytics with Hadoop, MongoDB and SQL Server
Владимир Слободянюк «DWH & BigData – architecture approaches»
Big Data: An Overview
Big Data Real Time Applications
Big Data Architecture
Big Data Analytics for Real Time Systems
My other computer is a datacentre - 2012 edition
Big Data Architecture and Design Patterns

What's hot (20)

PPTX
Great Expectations Presentation
PDF
SplunkSummit 2015 - Real World Big Data Architecture
PPTX
Big-Data Server Farm Architecture
PDF
Benefits of Hadoop as Platform as a Service
PDF
Introduction to Big Data Technologies & Applications
ODP
BigData Hadoop
PDF
Big Data Tech Stack
PDF
Lecture4 big data technology foundations
PPTX
Big Data Analytics Projects - Real World with Pentaho
PDF
Big Data Use Cases
PDF
Big Data Ecosystem
PDF
From hadoop to spark
PDF
Introduction to Big Data
PPTX
Big Data Analytics
PPTX
Pentaho Analytics on MongoDB
PPT
Big Tools for Big Data
PPTX
Hadoop Journey at Walgreens
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PPTX
Continuous Data Ingestion pipeline for the Enterprise
PPTX
BDaas- BigData as a service
Great Expectations Presentation
SplunkSummit 2015 - Real World Big Data Architecture
Big-Data Server Farm Architecture
Benefits of Hadoop as Platform as a Service
Introduction to Big Data Technologies & Applications
BigData Hadoop
Big Data Tech Stack
Lecture4 big data technology foundations
Big Data Analytics Projects - Real World with Pentaho
Big Data Use Cases
Big Data Ecosystem
From hadoop to spark
Introduction to Big Data
Big Data Analytics
Pentaho Analytics on MongoDB
Big Tools for Big Data
Hadoop Journey at Walgreens
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Continuous Data Ingestion pipeline for the Enterprise
BDaas- BigData as a service
Ad

Similar to Big Data Use Cases (20)

KEY
Exploring Big Data value for your business
PDF
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
PDF
Data-Ed: A Framework for no sql and Hadoop
PPT
Big data introduction, Hadoop in details
PDF
Big data analytics with Apache Hadoop
PPTX
Real-time Analytics in Big data
PPTX
Real-time Analytics in Big data
PDF
uae views on big data
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
PDF
Big Data and Implications on Platform Architecture
PDF
Introduction to Big Data and Hadoop
PDF
Capturing big value in big data
PPTX
Presentation on Big Data Analytics
PDF
Social Business in a World of Abundant Real-time Data
PPTX
Big data4businessusers
PDF
Big Data Analytics M1.pdf big data analytics
PPTX
Big Data Platform Landscape by 2017
PDF
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
PDF
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
PPTX
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Exploring Big Data value for your business
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed: A Framework for no sql and Hadoop
Big data introduction, Hadoop in details
Big data analytics with Apache Hadoop
Real-time Analytics in Big data
Real-time Analytics in Big data
uae views on big data
Big-Data-Seminar-6-Aug-2014-Koenig
Big Data and Implications on Platform Architecture
Introduction to Big Data and Hadoop
Capturing big value in big data
Presentation on Big Data Analytics
Social Business in a World of Abundant Real-time Data
Big data4businessusers
Big Data Analytics M1.pdf big data analytics
Big Data Platform Landscape by 2017
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Ad

More from boorad (12)

PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
PPTX
Hadoop and Storm - AJUG talk
PDF
Realtime Computation with Storm
PPTX
PhillyDB Talk - Beyond Batch
KEY
TriHUG - Beyond Batch
KEY
Realtime Computation with Storm
KEY
Large Scale Data Analysis Tools
KEY
DevNexus 2011
KEY
DevNation Atlanta
KEY
NOSQL, CouchDB, and the Cloud
PDF
Why Erlang? - Bar Camp Atlanta 2008
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns - TriHUG 6/27/2013
Hadoop and Storm - AJUG talk
Realtime Computation with Storm
PhillyDB Talk - Beyond Batch
TriHUG - Beyond Batch
Realtime Computation with Storm
Large Scale Data Analysis Tools
DevNexus 2011
DevNation Atlanta
NOSQL, CouchDB, and the Cloud
Why Erlang? - Bar Camp Atlanta 2008

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
SOPHOS-XG Firewall Administrator PPT.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
1. Introduction to Computer Programming.pptx
Machine learning based COVID-19 study performance prediction
Tartificialntelligence_presentation.pptx
Machine Learning_overview_presentation.pptx
Empathic Computing: Creating Shared Understanding

Big Data Use Cases

Editor's Notes

  • #11: SCRIPT:You can see from the Word Count example that a MapReduce is a low level construct. Typical applications require more complex processing, which is accomplished by performing multiple stages of MapReduce. Here is an example of a Hadoop system to detect account fraud after a security breach, using machine learning models. (*) Each step is its own MapReduce program. We’ll return to this example in more detail later.---------------[DON’T do any explanation of the algorithm here. Just twinkle the MR stages.(*) User transaction data is loaded into a distributed datastore for massive tables, such as HBase running on Hadoop, or native tables available with MapR’s M7 distribution.(*) There’s a training phase, to train the system what normal transactions look like.(*) Later, individual user transactions are scored against the “normal behavior” pattern.(*) Then, transactions with highly anomalous behavior are singled out as candidate events to be manually reviewed by analysts for potential fraud.In your data flow, any place you have a group-by, or join, or filter, or count occurrences event, it typically equates to one or more map-reduce jobs.
  • #12: MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.