SlideShare a Scribd company logo
Liferay & Big Data 
Getting value from your data 
! 
Miguel Ángel Pastor Olivar 
miguel.pastor@liferay.com
Who am I? 
! 
• Some random guy 
! 
• Member of the Liferay core infrastructure 
team 
! 
•Disclaimer: Not a computer scientist 
! 
• @miguelinlas3
What are we going to talk about? 
! 
• Big Data: what is this about? 
! 
• Simple architecture proposal 
! 
• Use cases 
! 
• Questions (and hopefully answers)
Big Data?
• Data is so big that regular solutions are: 
! 
–Extremely slow 
! 
–Too small 
! 
–Really expensive 
! 
• How we use all the data we already own
! 
• Volume 
–Transactions, data streaming from social media, … 
! 
• Velocity 
–Torrents of data in real time 
! 
• Variety 
–Numerical data, text, email, video, audio, …
Popular usages
• Recommender systems 
! 
• Predicting the future: 
– Netflix does autoscaling based on past 
network data traffic 
! 
• Churn models 
– Big telco companies build social networks 
to reduce the churn
• Sentiment analysis 
–Are talking about you in the Internet? 
! 
• Real Time Bidding 
–Optimise advertising 
! 
• Health care 
–Improve patients health while reducing costs 
–Improve quality of life of multiple sclerosis patients
Terminology
• Storage models 
• How to store relevant information 
! 
• Computation models 
• Process and transform all the information 
! 
• Analytics 
• How we can take actions based on the 
previous steps
Big Data 
Architectures
Data storage
Hadoop Distributed File System (HDFS) 
! 
• Java based file system 
! 
• Scalable, fault-tolerant, distributed storage 
! 
• Designed to run on commodity hardware 
! 
• Closely related to MapReduce
Source: http://hortonworks.com/
NoSQL storage
• Semistructured data 
! 
• Focused on 
! 
• Horizontal scalability 
! 
• Availability 
! 
• Different trade-offs: CAP, BASE, … 
!
NewSQL 
storage
• Modern relational databases 
! 
• Same scalable performance than NoSQL for 
OLTP 
! 
• Maintain ACID guarantees 
! 
• A few alternatives: VoltDB, Google Spanner, 
FoundationDB, …
Computation 
and analytics
Apache Hadoop
Apache Hadoop Map Reduce 
! 
• Distributed processing 
! 
• Large datasets 
! 
•Clusters of computers 
#LRNAS2014 
! 
• Simple programming model 
! 
• Verbose and hard to use API
Liferay 
projects 
is 
the 
best 
Open 
Source 
project 
best: 1 
is: 1 
Liferay: 1 
Open: 1 
project: 2 
Source: 1 
the: 1 
(index, “…”) 
(index, “…”) 
(index, “…”) 
(index, “…”) 
(index, “…”) 
Sort 
and 
shuffle 
(best, [1]) 
(is, [1]) 
(Liferay: 1) 
(Open, [1]) 
(project, [1,1]) 
(Source, [1]) 
(the, [1])
• Batch model data crunching 
! 
• Not so good event stream processing 
! 
• But … 
! 
• Many algorithms hard to implement using 
MapReduce 
! 
• Cascading, Scalding, Cascalog, Impala, …
Apache Storm
• Distributed realtime computation system 
! 
• Easy to reliably process unbounded streams of data 
! 
• Multi language support 
! 
• Realtime analytics, online machine learning, continuous 
computation, distributed RPC, ETL, …
Spout 
Spout 
Bolt Bolt 
Bolt
Apache Spark
• Fast and general-purpose cluster computing 
• Developed by Berkeley AMP 
! 
• High level APIs (not MapReduce) 
! 
• Optimised engine: 
• supports general execution graphs 
! 
• Higher-level tools: 
• Spark SQL, MLib, Spark Streaming, Graphx
Apache Mahout
! 
• Scalable machine learning library 
#LRNAS2014 
! 
• Built on top of Hadoop 
! 
• Some algorithms don’t require Hadoop at all 
#LRNAS2014
R language
• Focused on: 
• Data visualisation 
• Statistical computations 
• Analysis of data 
! 
• Tons of built-in packages 
! 
• Connect to Hadoop through Hadoop Streaming 
! 
• Not a fast language
Reference 
Architecture
RDBMS 
Event Broker 
Hadoop 
User 
Tracking 
NoSQL 
Storage 
System 
Events 
Search 
Data 
Logs 
Monitoring Dataware 
House 
Streaming Social 
Graph
Datasources
RDBMS 
Event Broker 
Hadoop 
User 
Tracking 
NoSQL 
Storage 
System 
Events 
Search 
Data 
Logs 
Monitoring Dataware 
House 
Streaming Social 
Graph
• System events 
! 
• User tracking (client side) 
• Clicks, navigation, activities, … 
! 
• Monitoring (transactions, load page times, …) 
! 
• Models (message boards, blogs, wiki …) 
! 
• Custom developments …
Event broker
RDBMS 
Event Broker 
Hadoop 
User 
Tracking 
NoSQL 
Storage 
System 
Events 
Search 
Data 
Logs 
Monitoring Dataware 
House 
Streaming Social 
Graph
Data Source 
0 1 2 3 4 5 6 7 8 
Writes 
9 
Reads Reads 
System A System B
Apache Kafka 
! 
• Publish-subscribe as distributed commit log 
! 
• Fast 
! 
• Scalable 
! 
• Durable 
! 
• Distributed by design
Broker A 
Broker B 
Producer Consumer 
Broker C 
ZooKeeper
Computation 
and analytics
RDBMS 
Event Broker 
Hadoop 
User 
Tracking 
NoSQL 
Storage 
System 
Events 
Search 
Data 
Logs 
Monitoring Dataware 
House 
Streaming Social 
Graph
Batch processing? 
! 
Real time processing? 
! 
Machine learning algorithms? 
! 
Graph analysis? 
! 
Unified programming model?
Liferay & Big Data Dev Con 2014
! 
• Fast and general engine for large-scale data 
processing 
! 
• Write your apps in Java, Scala or Python 
! 
• Run on YARN cluster manager 
! 
• Can read any existing Hadoop data (HDFS) 
! 
• In memory or disk
Apache Spark Main Components 
Apache Spark 
Spark SQL 
Spark 
Streaming MLib GraphX
Spark Core
• Driver main function and executes various 
parallel operations on a cluster 
! 
• Resilient Distributed Datasets (RDD) 
• HDFS (or any Hadoop file system) 
! 
• Scala collection 
! 
• Second abstraction: shared variables
Spark SQL
• Mix SQL queries with Spark programs 
! 
• Unified Data Access 
! 
• Hive compatibility 
! 
• Standard JDBC or ODBC connectivity 
! 
• Same engine for both interactive and long running 
queries
Spark Streaming
• Build your apps using high-level operators 
! 
• Fault tolerance: exactly-once semantics out of the box 
! 
• Combine streaming with batch and interactive queries 
! 
• Can read from HDFS, Flume, Kafka, Twitter and ZeroMQ 
! 
• Define your own custom data sources
Spark MLib
! 
• Basic statistics 
• Summary statistics 
• Correlations 
• …. 
! 
• Classification and regression 
• Linear models 
• Decision tress 
• Naive Bayes
! 
• Clustering 
• K-Means 
! 
• Collaborative filtering 
• Alternate least squares 
! 
• Dimensionality reduction 
• Singular value decomposition 
! 
• Principal component analysis
Spark GraphX
! 
• Graphs API and graph-parallel computation 
! 
• Growing scale and importance 
• From social networks to language modelling 
! 
• Directed multigraph with properties attached to each 
vertex and edge 
! 
• Growing collection of graph algorithms and builders
Live demo! 
Building a messages 
classifier
Takeaways
• Not about data size, but how you use it 
! 
• You already own tons of data, you just need to take get 
value from it 
! 
• There is no silver bullet: you’ve plenty of alternatives 
! 
• JVM Big data related techs are usually a great choice 
! 
• Try it yourself!!
References
!• 
Apache Kafka 
! 
• Apache Spark 
! 
• Apache Storm 
! 
• Apache Hadoop 
! 
• Big Data definition at Wikipedia 
! 
• Liferay Kafka Bridge 
! 
• What every software engineer should know about a log
Thank you!!
Questions 
(and hopefully answers)

More Related Content

PPTX
Azure Cosmos DB L100 Pitch Deck
PDF
Liferay and Big Data
PDF
Outgrowing an internet startup: database administration in a fast growing com...
PDF
C*ollege Credit: Is My App a Good Fit for Cassandra?
PDF
RealTime Recommendations @Netflix - Spark
PDF
Simplify Governance of Streaming Data
PDF
Getting Ready to Use Redis with Apache Spark with Tague Griffith
PPTX
HBaseCon 2013: General Session
Azure Cosmos DB L100 Pitch Deck
Liferay and Big Data
Outgrowing an internet startup: database administration in a fast growing com...
C*ollege Credit: Is My App a Good Fit for Cassandra?
RealTime Recommendations @Netflix - Spark
Simplify Governance of Streaming Data
Getting Ready to Use Redis with Apache Spark with Tague Griffith
HBaseCon 2013: General Session

What's hot (20)

PDF
Uber's data science workbench
PDF
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
PPTX
MongoDB Days Germany: Data Processing with MongoDB
PPTX
Architecting Your First Big Data Implementation
PPTX
Solr + Hadoop: Interactive Search for Hadoop
PPTX
How do spark_kafka_and_syncsort_dmx-h
PDF
Architecting next generation big data platform
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Hadoop application architectures - using Customer 360 as an example
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPT
Big Data Paris : Hadoop and NoSQL
PDF
Elasticsearch JVM-MX Meetup April 2016
PDF
Architecting a next-generation data platform
PPTX
Real time monitoring of hadoop and spark workflows
PDF
Architecting a next generation data platform
PDF
Hybrid Apache Spark Architecture with YARN and Kubernetes
PPTX
10 Big Data Technologies you Didn't Know About
PDF
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
PDF
Solr for Data Science
PDF
Apache Kafka - Scalable Message-Processing and more !
Uber's data science workbench
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
MongoDB Days Germany: Data Processing with MongoDB
Architecting Your First Big Data Implementation
Solr + Hadoop: Interactive Search for Hadoop
How do spark_kafka_and_syncsort_dmx-h
Architecting next generation big data platform
Apache Kafka - Scalable Message-Processing and more !
Hadoop application architectures - using Customer 360 as an example
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Big Data Paris : Hadoop and NoSQL
Elasticsearch JVM-MX Meetup April 2016
Architecting a next-generation data platform
Real time monitoring of hadoop and spark workflows
Architecting a next generation data platform
Hybrid Apache Spark Architecture with YARN and Kubernetes
10 Big Data Technologies you Didn't Know About
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Solr for Data Science
Apache Kafka - Scalable Message-Processing and more !
Ad

Viewers also liked (20)

PPTX
3. Sinagogas, inspiración para Grupos Pequeños
PDF
Arianrod prefacio1
PDF
KIAC_Conference Report_Print
PDF
Curso Comunicacion 2
PPT
PDF
Arrow ECS - One Source, IT Skills & Serivces
PDF
Algo de astronomia
PDF
Water and Waste Water Treatment - EN - 140716 - webreduced
PDF
Integración prevención 03 10-10
PPT
CyberAttack -- Whose side is your computer on?
DOCX
Origen y significado del día de muertos
PDF
HSBP June Invite
PDF
Netherlands Fuel Card Briefing
PDF
Dermlite Dermatoscopes
PPTX
Como funciona el alcohol en el cuerpo
PPTX
Vhigo Mase
PDF
Reputacion online C4E
PDF
Future Academy - Cerificate
PPTX
Mr. Eduard Rodès Director of the European Short Sea Shipping School
PPT
Customer Lifestage
3. Sinagogas, inspiración para Grupos Pequeños
Arianrod prefacio1
KIAC_Conference Report_Print
Curso Comunicacion 2
Arrow ECS - One Source, IT Skills & Serivces
Algo de astronomia
Water and Waste Water Treatment - EN - 140716 - webreduced
Integración prevención 03 10-10
CyberAttack -- Whose side is your computer on?
Origen y significado del día de muertos
HSBP June Invite
Netherlands Fuel Card Briefing
Dermlite Dermatoscopes
Como funciona el alcohol en el cuerpo
Vhigo Mase
Reputacion online C4E
Future Academy - Cerificate
Mr. Eduard Rodès Director of the European Short Sea Shipping School
Customer Lifestage
Ad

Similar to Liferay & Big Data Dev Con 2014 (20)

PDF
Started with-apache-spark
PPTX
Big data overview
PPTX
PPTX
Apache Spark in Industry
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
PDF
Spark After Dark - LA Apache Spark Users Group - Feb 2015
PDF
Spark after Dark by Chris Fregly of Databricks
PPTX
In Memory Analytics with Apache Spark
PDF
Bds session 13 14
PDF
Apache Spark 101 - Demi Ben-Ari
PPTX
Apache Spark Fundamentals
PDF
Big data processing with apache spark
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
PPTX
Glint with Apache Spark
PPTX
Unit 1 - Introduction to Big Data and hadoop.pptx
PPTX
Big Data - An Overview
PPTX
Analyzing Big data in R and Scala using Apache Spark 17-7-19
PDF
How Apache Spark fits in the Big Data landscape
PDF
Apache Spark Presentation good for big data
Started with-apache-spark
Big data overview
Apache Spark in Industry
Sa introduction to big data pipelining with cassandra & spark west mins...
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark after Dark by Chris Fregly of Databricks
In Memory Analytics with Apache Spark
Bds session 13 14
Apache Spark 101 - Demi Ben-Ari
Apache Spark Fundamentals
Big data processing with apache spark
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Glint with Apache Spark
Unit 1 - Introduction to Big Data and hadoop.pptx
Big Data - An Overview
Analyzing Big data in R and Scala using Apache Spark 17-7-19
How Apache Spark fits in the Big Data landscape
Apache Spark Presentation good for big data

More from Miguel Pastor (17)

PDF
Microservices: The OSGi way A different vision on microservices
PDF
Reactive applications and Akka intro used in the Madrid Scala Meetup
PDF
Reactive applications using Akka
PPTX
Liferay Devcon 2013: Our way towards modularity
ODP
Liferay Module Framework
ODP
Liferay and Cloud
PDF
Jvm fundamentals
PDF
Scala Overview
ODP
Hadoop, Cloud y Spring
PDF
Scala: un vistazo general
ODP
Platform as a Service overview
ODP
HadoopDB
PDF
Aspect Oriented Programming introduction
ODP
Software measure-slides
ODP
Arquitecturas MMOG
ODP
Software Failures
ODP
Groovy and Grails intro
Microservices: The OSGi way A different vision on microservices
Reactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications using Akka
Liferay Devcon 2013: Our way towards modularity
Liferay Module Framework
Liferay and Cloud
Jvm fundamentals
Scala Overview
Hadoop, Cloud y Spring
Scala: un vistazo general
Platform as a Service overview
HadoopDB
Aspect Oriented Programming introduction
Software measure-slides
Arquitecturas MMOG
Software Failures
Groovy and Grails intro

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Transcultural that can help you someday.
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Leprosy and NLEP programme community medicine
PDF
How to run a consulting project- client discovery
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
STERILIZATION AND DISINFECTION-1.ppthhhbx
importance of Data-Visualization-in-Data-Science. for mba studnts
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Transcultural that can help you someday.
Pilar Kemerdekaan dan Identi Bangsa.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
annual-report-2024-2025 original latest.
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Qualitative Qantitative and Mixed Methods.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction-to-Cloud-ComputingFinal.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Leprosy and NLEP programme community medicine
How to run a consulting project- client discovery
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck

Liferay & Big Data Dev Con 2014

  • 1. Liferay & Big Data Getting value from your data ! Miguel Ángel Pastor Olivar [email protected]
  • 2. Who am I? ! • Some random guy ! • Member of the Liferay core infrastructure team ! •Disclaimer: Not a computer scientist ! • @miguelinlas3
  • 3. What are we going to talk about? ! • Big Data: what is this about? ! • Simple architecture proposal ! • Use cases ! • Questions (and hopefully answers)
  • 5. • Data is so big that regular solutions are: ! –Extremely slow ! –Too small ! –Really expensive ! • How we use all the data we already own
  • 6. ! • Volume –Transactions, data streaming from social media, … ! • Velocity –Torrents of data in real time ! • Variety –Numerical data, text, email, video, audio, …
  • 8. • Recommender systems ! • Predicting the future: – Netflix does autoscaling based on past network data traffic ! • Churn models – Big telco companies build social networks to reduce the churn
  • 9. • Sentiment analysis –Are talking about you in the Internet? ! • Real Time Bidding –Optimise advertising ! • Health care –Improve patients health while reducing costs –Improve quality of life of multiple sclerosis patients
  • 11. • Storage models • How to store relevant information ! • Computation models • Process and transform all the information ! • Analytics • How we can take actions based on the previous steps
  • 14. Hadoop Distributed File System (HDFS) ! • Java based file system ! • Scalable, fault-tolerant, distributed storage ! • Designed to run on commodity hardware ! • Closely related to MapReduce
  • 17. • Semistructured data ! • Focused on ! • Horizontal scalability ! • Availability ! • Different trade-offs: CAP, BASE, … !
  • 19. • Modern relational databases ! • Same scalable performance than NoSQL for OLTP ! • Maintain ACID guarantees ! • A few alternatives: VoltDB, Google Spanner, FoundationDB, …
  • 22. Apache Hadoop Map Reduce ! • Distributed processing ! • Large datasets ! •Clusters of computers #LRNAS2014 ! • Simple programming model ! • Verbose and hard to use API
  • 23. Liferay projects is the best Open Source project best: 1 is: 1 Liferay: 1 Open: 1 project: 2 Source: 1 the: 1 (index, “…”) (index, “…”) (index, “…”) (index, “…”) (index, “…”) Sort and shuffle (best, [1]) (is, [1]) (Liferay: 1) (Open, [1]) (project, [1,1]) (Source, [1]) (the, [1])
  • 24. • Batch model data crunching ! • Not so good event stream processing ! • But … ! • Many algorithms hard to implement using MapReduce ! • Cascading, Scalding, Cascalog, Impala, …
  • 26. • Distributed realtime computation system ! • Easy to reliably process unbounded streams of data ! • Multi language support ! • Realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, …
  • 27. Spout Spout Bolt Bolt Bolt
  • 29. • Fast and general-purpose cluster computing • Developed by Berkeley AMP ! • High level APIs (not MapReduce) ! • Optimised engine: • supports general execution graphs ! • Higher-level tools: • Spark SQL, MLib, Spark Streaming, Graphx
  • 31. ! • Scalable machine learning library #LRNAS2014 ! • Built on top of Hadoop ! • Some algorithms don’t require Hadoop at all #LRNAS2014
  • 33. • Focused on: • Data visualisation • Statistical computations • Analysis of data ! • Tons of built-in packages ! • Connect to Hadoop through Hadoop Streaming ! • Not a fast language
  • 35. RDBMS Event Broker Hadoop User Tracking NoSQL Storage System Events Search Data Logs Monitoring Dataware House Streaming Social Graph
  • 37. RDBMS Event Broker Hadoop User Tracking NoSQL Storage System Events Search Data Logs Monitoring Dataware House Streaming Social Graph
  • 38. • System events ! • User tracking (client side) • Clicks, navigation, activities, … ! • Monitoring (transactions, load page times, …) ! • Models (message boards, blogs, wiki …) ! • Custom developments …
  • 40. RDBMS Event Broker Hadoop User Tracking NoSQL Storage System Events Search Data Logs Monitoring Dataware House Streaming Social Graph
  • 41. Data Source 0 1 2 3 4 5 6 7 8 Writes 9 Reads Reads System A System B
  • 42. Apache Kafka ! • Publish-subscribe as distributed commit log ! • Fast ! • Scalable ! • Durable ! • Distributed by design
  • 43. Broker A Broker B Producer Consumer Broker C ZooKeeper
  • 45. RDBMS Event Broker Hadoop User Tracking NoSQL Storage System Events Search Data Logs Monitoring Dataware House Streaming Social Graph
  • 46. Batch processing? ! Real time processing? ! Machine learning algorithms? ! Graph analysis? ! Unified programming model?
  • 48. ! • Fast and general engine for large-scale data processing ! • Write your apps in Java, Scala or Python ! • Run on YARN cluster manager ! • Can read any existing Hadoop data (HDFS) ! • In memory or disk
  • 49. Apache Spark Main Components Apache Spark Spark SQL Spark Streaming MLib GraphX
  • 51. • Driver main function and executes various parallel operations on a cluster ! • Resilient Distributed Datasets (RDD) • HDFS (or any Hadoop file system) ! • Scala collection ! • Second abstraction: shared variables
  • 53. • Mix SQL queries with Spark programs ! • Unified Data Access ! • Hive compatibility ! • Standard JDBC or ODBC connectivity ! • Same engine for both interactive and long running queries
  • 55. • Build your apps using high-level operators ! • Fault tolerance: exactly-once semantics out of the box ! • Combine streaming with batch and interactive queries ! • Can read from HDFS, Flume, Kafka, Twitter and ZeroMQ ! • Define your own custom data sources
  • 57. ! • Basic statistics • Summary statistics • Correlations • …. ! • Classification and regression • Linear models • Decision tress • Naive Bayes
  • 58. ! • Clustering • K-Means ! • Collaborative filtering • Alternate least squares ! • Dimensionality reduction • Singular value decomposition ! • Principal component analysis
  • 60. ! • Graphs API and graph-parallel computation ! • Growing scale and importance • From social networks to language modelling ! • Directed multigraph with properties attached to each vertex and edge ! • Growing collection of graph algorithms and builders
  • 61. Live demo! Building a messages classifier
  • 63. • Not about data size, but how you use it ! • You already own tons of data, you just need to take get value from it ! • There is no silver bullet: you’ve plenty of alternatives ! • JVM Big data related techs are usually a great choice ! • Try it yourself!!
  • 65. !• Apache Kafka ! • Apache Spark ! • Apache Storm ! • Apache Hadoop ! • Big Data definition at Wikipedia ! • Liferay Kafka Bridge ! • What every software engineer should know about a log