SlideShare a Scribd company logo
Understanding Big Data
Applications and Architectures
1st JTC 1 SGBD Meeting
SDSC San Diego March 19 2014
Geoffrey Fox
Judy Qiu
Shantenu Jha (Rutgers)
gcf@indiana.edu
http://www.infomall.org
School of Informatics and Computing
Digital Science Center
Indiana University Bloomington
51 Detailed Use Cases: Contributed July-September 2013
Covers goals, data features such as 3 V’s, software, hardware
• http://bigdatawg.nist.gov/usecases.php
• https://bigdatacoursespring2014.appspot.com/course (Section 5)
• Government Operation(4): National Archives and Records Administration, Census Bureau
• Commercial(8): Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search,
Digital Materials, Cargo shipping (as in UPS)
• Defense(3): Sensors, Image surveillance, Situation Assessment
• Healthcare and Life Sciences(10): Medical records, Graph and Probabilistic analysis,
Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity
• Deep Learning and Social Media(6): Driving Car, Geolocate images/cameras, Twitter, Crowd
Sourcing, Network Science, NIST benchmark datasets
• The Ecosystem for Research(4): Metadata, Collaboration, Language Translation, Light source
experiments
• Astronomy and Physics(5): Sky Surveys including comparison to simulation, Large Hadron
Collider at CERN, Belle Accelerator II in Japan
• Earth, Environmental and Polar Science(10): Radar Scattering in Atmosphere, Earthquake,
Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate
simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry
(microbes to watersheds), AmeriFlux and FLUXNET gas sensors
• Energy(1): Smart grid 2
26 Features for each use case
Would like to capture “essence of
these use cases”
“small” kernels, mini-apps
Or Classify applications into patterns
Do it from HPC background not database view
point
i.e. focus on cases with detailed analytics
What are “mini-Applications”
• Use for benchmarks of computers and software (is my
parallel compiler any good?)
• In parallel computing, this is well established
– Linpack for measuring performance to rank machines in Top500
(changing?)
– NAS Parallel Benchmarks (originally a pencil and paper
specification to allow optimal implementations; then MPI library)
– Other specialized Benchmark sets keep changing and used to
guide procurements
• Last 2 NSF hardware solicitations had NO preset benchmarks –
perhaps as no agreement on key applications for clouds and
data intensive applications
– Berkeley dwarfs capture different structures that any approach
to parallel computing must address
– Templates used to capture parallel computing patterns
• I’ll let experts comment on database benchmarks like TPC
HPC Benchmark Classics
• Linpack or HPL: Parallel LU factorization for solution of
linear equations
• NPB version 1: Mainly classic HPC solver kernels
– MG: Multigrid
– CG: Conjugate Gradient
– FT: Fast Fourier Transform
– IS: Integer sort
– EP: Embarrassingly Parallel
– BT: Block Tridiagonal
– SP: Scalar Pentadiagonal
– LU: Lower-Upper symmetric Gauss Seidel
7 Original Berkeley Dwarfs (Colella)
1. Structured Grids (including locally structured
grids, e.g. Adaptive Mesh Refinement)
2. Unstructured Grids
3. Fast Fourier Transform
4. Dense Linear Algebra
5. Sparse Linear Algebra
6. Particles
7. Monte Carlo
8. Note “vaguer” than NPB
13 Berkeley Dwarfs
• Dense Linear Algebra
• Sparse Linear Algebra
• Spectral Methods
• N-Body Methods
• Structured Grids
• Unstructured Grids
• MapReduce
• Combinational Logic
• Graph Traversal
• Dynamic Programming
• Backtrack and Branch-and-Bound
• Graphical Models
• Finite State Machines
First 6 of these correspond to
Colella’s original.
Monte Carlo dropped
N-body methods are a subset of
Particle
Note a little inconsistent in that
MapReduce is a programming
model and spectral method is a
numerical method
Need multiple facets!
Distributed Computing MetaPatterns I
Jha, Cole, Katz, Parashar, Rana, Weissman
Distributed Computing MetaPatterns II
Jha, Cole, Katz, Parashar, Rana, Weissman
Distributed Computing MetaPatterns III
Jha, Cole, Katz, Parashar, Rana, Weissman
Problem Architecture Facet of Ogres (Meta or
MacroPattern)
i. Pleasingly Parallel – as in Blast, Protein docking, imagery
ii. Local Analytics or Machine Learning – ML or filtering
pleasingly parallel as in bio-imagery, radar images (really
just pleasingly parallel but sophisticated local analytics)
iii. Global Analytics or Machine Learning seen in LDA,
Clustering etc. with parallel ML over nodes of system
iv. SPMD (Single Program Multiple Data)
v. Bulk Synchronous Processing: well defined compute-
communication phases
vi. Fusion: Knowledge discovery often involves fusion of
multiple methods.
vii. Workflow (often used in fusion)
Core Analytics Facet of Ogres (microPattern)
i. Search/Query
ii. Local Machine Learning – pleasingly parallel
iii. Summarizing statistics
iv. Recommender Systems (Collaborative Filtering)
v. Outlier Detection (iORCA)
vi. Clustering (many methods),
vii. LDA (Latent Dirichlet Allocation) or variants like PLSI (Probabilistic
Latent Semantic Indexing),
viii. SVM and Linear Classifiers (Bayes, Random Forests),
ix. PageRank, (Find leading eigenvector of sparse matrix)
x. SVD (Singular Value Decomposition),
xi. Learning Neural Networks (Deep Learning),
xii. MDS (Multidimensional Scaling),
xiii. Graph Structure Algorithms (seen in search of RDF Triple stores),
xiv. Network Dynamics - Graph simulation Algorithms (epidemiology)
Matrix
Algebra
Global
Optimization
Analytics Features Facet of Ogres
• These core analytics/kernels can be classified by features
like
• (a) Flops per byte;
• (b) Communication Interconnect requirements;
• (c) Is application (graph) constant or dynamic
• (d) Is communication BSP or Asynchronous
• (e) Are algorithms Iterative or not?
• (f) Are data points in metric or non-metric spaces
Application Class Facet of Ogres
• (a) Search and query
• (b) Maximum Likelihood,
• (c) 2 minimizations,
• (d) Expectation Maximization (often Steepest descent)
• (e) Global Optimization (Variational Bayes)
• (f) Agents, as in epidemiology (swarm approaches)
• (g) GIS (Geographical Information Systems).
Data Source Facet of Ogres
• (i) SQL,
• (ii) NOSQL based,
• (iii) Other Enterprise data systems (10 examples from Bob Marcus)
• (iv) Set of Files (as managed in iRODS),
• (v) Internet of Things,
• (vi) Streaming and
• (vii) HPC simulations.
• Before data gets to compute system, there is often an initial data
gathering phase which is characterized by a block size and timing. Block
size varies from month (Remote Sensing, Seismic) to day (genomic) to
seconds or lower (Real time control, streaming)
• There are storage/compute system styles: Shared, Dedicated,
Permanent, Transient
• Other characteristics are need for permanent auxiliary/comparison
datasets and these could be interdisciplinary implying nontrivial data
movement/replication
Lessons / Insights
• Ogres classify Big Data applications by multiple
facets – each with several exemplars and features
– Guide to breadth and depth of Big Data
– Does your architecture/software support all the ogres?
• Add database exemplars
• In parallel computing, the simple analytic kernels
dominate mindshare even though agreed limited
• Section 5 of my class
https://bigdatacoursespring2014.appspot.com/preview
classifies 51 use cases with these facets

More Related Content

PPTX
51 Use Cases and implications for HPC & Apache Big Data Stack
PDF
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
PDF
High Performance Data Analytics and a Java Grande Run Time
PPTX
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
PPTX
Comparing Big Data and Simulation Applications and Implications for Software ...
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Cloud Services for Big Data Analytics
PPTX
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
51 Use Cases and implications for HPC & Apache Big Data Stack
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
High Performance Data Analytics and a Java Grande Run Time
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Comparing Big Data and Simulation Applications and Implications for Software ...
Matching Data Intensive Applications and Hardware/Software Architectures
Cloud Services for Big Data Analytics
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...

What's hot (20)

PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
PPTX
Big Data HPC Convergence
PPTX
High Performance Processing of Streaming Data
PPTX
Visualizing and Clustering Life Science Applications in Parallel 
PDF
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
PPTX
Big Data HPC Convergence and a bunch of other things
PDF
07 data structures_and_representations
PDF
04 open source_tools
PPTX
Big data analytics
PPTX
High Performance Data Analytics with Java on Large Multicore HPC Clusters
PDF
Présentation on radoop
PPT
Big Tools for Big Data
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
PPTX
Learning Systems for Science
PPTX
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
PDF
Scientific Application Development and Early results on Summit
PPT
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Matching Data Intensive Applications and Hardware/Software Architectures
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Big Data HPC Convergence
High Performance Processing of Streaming Data
Visualizing and Clustering Life Science Applications in Parallel 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data HPC Convergence and a bunch of other things
07 data structures_and_representations
04 open source_tools
Big data analytics
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Présentation on radoop
Big Tools for Big Data
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Learning Systems for Science
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Scientific Application Development and Early results on Summit
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Ad

Similar to Classification of Big Data Use Cases by different Facets (20)

PPTX
High Performance Computing and Big Data
PPTX
Big data at experimental facilities
PDF
The Analytics Frontier of the Hadoop Eco-System
PDF
Lecture1 introduction to big data
PDF
Scalability20140226
PDF
On the-design-of-geographic-information-system-procedures
PDF
LDBC 8th TUC Meeting: Introduction and status update
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
PPT
Chapter 1. Introduction
PPTX
Big dataanalyticsbeyondhadoop public_20_june_2013
PDF
INF2190_W1_2016_public
PDF
Data Mining: Future Trends and Applications
PDF
Ling liu part 02:big graph processing
PPT
How to empower community by using GIS lecture 1
PDF
Big Data and IOT
PPTX
Term Paper Presentation
PDF
Ijariie1184
PDF
Ijariie1184
PPTX
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PPTX
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
High Performance Computing and Big Data
Big data at experimental facilities
The Analytics Frontier of the Hadoop Eco-System
Lecture1 introduction to big data
Scalability20140226
On the-design-of-geographic-information-system-procedures
LDBC 8th TUC Meeting: Introduction and status update
Cyberinfrastructure and Applications Overview: Howard University June22
Chapter 1. Introduction
Big dataanalyticsbeyondhadoop public_20_june_2013
INF2190_W1_2016_public
Data Mining: Future Trends and Applications
Ling liu part 02:big graph processing
How to empower community by using GIS lecture 1
Big Data and IOT
Term Paper Presentation
Ijariie1184
Ijariie1184
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
Ad

More from Geoffrey Fox (15)

PPTX
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
PPTX
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
PPTX
Data Science and Online Education
PPTX
Lessons from Data Science Program at Indiana University: Curriculum, Students...
PPTX
Data Science Curriculum at Indiana University
DOCX
Experience with Online Teaching with Open Source MOOC Technology
PDF
Big Data and Clouds: Research and Education
PDF
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
PPTX
Remarks on MOOC's
PPTX
FutureGrid Computing Testbed as a Service
PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
PPTX
NIST Big Data Public Working Group NBD-PWG
PPT
Linking Programming models between Grids, Web 2.0 and Multicore
PPT
CTS Conference Web 2.0 Tutorial Part 2
PPT
CTS Conference Web 2.0 Tutorial Part 1
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Data Science and Online Education
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Data Science Curriculum at Indiana University
Experience with Online Teaching with Open Source MOOC Technology
Big Data and Clouds: Research and Education
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Remarks on MOOC's
FutureGrid Computing Testbed as a Service
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
NIST Big Data Public Working Group NBD-PWG
Linking Programming models between Grids, Web 2.0 and Multicore
CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 1

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Machine Learning_overview_presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
SOPHOS-XG Firewall Administrator PPT.pptx
Machine Learning_overview_presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Classification of Big Data Use Cases by different Facets

  • 1. Understanding Big Data Applications and Architectures 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Geoffrey Fox Judy Qiu Shantenu Jha (Rutgers) [email protected] http://www.infomall.org School of Informatics and Computing Digital Science Center Indiana University Bloomington
  • 2. 51 Detailed Use Cases: Contributed July-September 2013 Covers goals, data features such as 3 V’s, software, hardware • http://bigdatawg.nist.gov/usecases.php • https://bigdatacoursespring2014.appspot.com/course (Section 5) • Government Operation(4): National Archives and Records Administration, Census Bureau • Commercial(8): Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (as in UPS) • Defense(3): Sensors, Image surveillance, Situation Assessment • Healthcare and Life Sciences(10): Medical records, Graph and Probabilistic analysis, Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity • Deep Learning and Social Media(6): Driving Car, Geolocate images/cameras, Twitter, Crowd Sourcing, Network Science, NIST benchmark datasets • The Ecosystem for Research(4): Metadata, Collaboration, Language Translation, Light source experiments • Astronomy and Physics(5): Sky Surveys including comparison to simulation, Large Hadron Collider at CERN, Belle Accelerator II in Japan • Earth, Environmental and Polar Science(10): Radar Scattering in Atmosphere, Earthquake, Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds), AmeriFlux and FLUXNET gas sensors • Energy(1): Smart grid 2 26 Features for each use case
  • 3. Would like to capture “essence of these use cases” “small” kernels, mini-apps Or Classify applications into patterns Do it from HPC background not database view point i.e. focus on cases with detailed analytics
  • 4. What are “mini-Applications” • Use for benchmarks of computers and software (is my parallel compiler any good?) • In parallel computing, this is well established – Linpack for measuring performance to rank machines in Top500 (changing?) – NAS Parallel Benchmarks (originally a pencil and paper specification to allow optimal implementations; then MPI library) – Other specialized Benchmark sets keep changing and used to guide procurements • Last 2 NSF hardware solicitations had NO preset benchmarks – perhaps as no agreement on key applications for clouds and data intensive applications – Berkeley dwarfs capture different structures that any approach to parallel computing must address – Templates used to capture parallel computing patterns • I’ll let experts comment on database benchmarks like TPC
  • 5. HPC Benchmark Classics • Linpack or HPL: Parallel LU factorization for solution of linear equations • NPB version 1: Mainly classic HPC solver kernels – MG: Multigrid – CG: Conjugate Gradient – FT: Fast Fourier Transform – IS: Integer sort – EP: Embarrassingly Parallel – BT: Block Tridiagonal – SP: Scalar Pentadiagonal – LU: Lower-Upper symmetric Gauss Seidel
  • 6. 7 Original Berkeley Dwarfs (Colella) 1. Structured Grids (including locally structured grids, e.g. Adaptive Mesh Refinement) 2. Unstructured Grids 3. Fast Fourier Transform 4. Dense Linear Algebra 5. Sparse Linear Algebra 6. Particles 7. Monte Carlo 8. Note “vaguer” than NPB
  • 7. 13 Berkeley Dwarfs • Dense Linear Algebra • Sparse Linear Algebra • Spectral Methods • N-Body Methods • Structured Grids • Unstructured Grids • MapReduce • Combinational Logic • Graph Traversal • Dynamic Programming • Backtrack and Branch-and-Bound • Graphical Models • Finite State Machines First 6 of these correspond to Colella’s original. Monte Carlo dropped N-body methods are a subset of Particle Note a little inconsistent in that MapReduce is a programming model and spectral method is a numerical method Need multiple facets!
  • 8. Distributed Computing MetaPatterns I Jha, Cole, Katz, Parashar, Rana, Weissman
  • 9. Distributed Computing MetaPatterns II Jha, Cole, Katz, Parashar, Rana, Weissman
  • 10. Distributed Computing MetaPatterns III Jha, Cole, Katz, Parashar, Rana, Weissman
  • 11. Problem Architecture Facet of Ogres (Meta or MacroPattern) i. Pleasingly Parallel – as in Blast, Protein docking, imagery ii. Local Analytics or Machine Learning – ML or filtering pleasingly parallel as in bio-imagery, radar images (really just pleasingly parallel but sophisticated local analytics) iii. Global Analytics or Machine Learning seen in LDA, Clustering etc. with parallel ML over nodes of system iv. SPMD (Single Program Multiple Data) v. Bulk Synchronous Processing: well defined compute- communication phases vi. Fusion: Knowledge discovery often involves fusion of multiple methods. vii. Workflow (often used in fusion)
  • 12. Core Analytics Facet of Ogres (microPattern) i. Search/Query ii. Local Machine Learning – pleasingly parallel iii. Summarizing statistics iv. Recommender Systems (Collaborative Filtering) v. Outlier Detection (iORCA) vi. Clustering (many methods), vii. LDA (Latent Dirichlet Allocation) or variants like PLSI (Probabilistic Latent Semantic Indexing), viii. SVM and Linear Classifiers (Bayes, Random Forests), ix. PageRank, (Find leading eigenvector of sparse matrix) x. SVD (Singular Value Decomposition), xi. Learning Neural Networks (Deep Learning), xii. MDS (Multidimensional Scaling), xiii. Graph Structure Algorithms (seen in search of RDF Triple stores), xiv. Network Dynamics - Graph simulation Algorithms (epidemiology) Matrix Algebra Global Optimization
  • 13. Analytics Features Facet of Ogres • These core analytics/kernels can be classified by features like • (a) Flops per byte; • (b) Communication Interconnect requirements; • (c) Is application (graph) constant or dynamic • (d) Is communication BSP or Asynchronous • (e) Are algorithms Iterative or not? • (f) Are data points in metric or non-metric spaces
  • 14. Application Class Facet of Ogres • (a) Search and query • (b) Maximum Likelihood, • (c) 2 minimizations, • (d) Expectation Maximization (often Steepest descent) • (e) Global Optimization (Variational Bayes) • (f) Agents, as in epidemiology (swarm approaches) • (g) GIS (Geographical Information Systems).
  • 15. Data Source Facet of Ogres • (i) SQL, • (ii) NOSQL based, • (iii) Other Enterprise data systems (10 examples from Bob Marcus) • (iv) Set of Files (as managed in iRODS), • (v) Internet of Things, • (vi) Streaming and • (vii) HPC simulations. • Before data gets to compute system, there is often an initial data gathering phase which is characterized by a block size and timing. Block size varies from month (Remote Sensing, Seismic) to day (genomic) to seconds or lower (Real time control, streaming) • There are storage/compute system styles: Shared, Dedicated, Permanent, Transient • Other characteristics are need for permanent auxiliary/comparison datasets and these could be interdisciplinary implying nontrivial data movement/replication
  • 16. Lessons / Insights • Ogres classify Big Data applications by multiple facets – each with several exemplars and features – Guide to breadth and depth of Big Data – Does your architecture/software support all the ogres? • Add database exemplars • In parallel computing, the simple analytic kernels dominate mindshare even though agreed limited • Section 5 of my class https://bigdatacoursespring2014.appspot.com/preview classifies 51 use cases with these facets

Editor's Notes

  • #12: Big dwarfs are OgresImplement Ogres in ABDS+
  • #16: Big dwarfs are OgresImplement Ogres in ABDS+