SlideShare a Scribd company logo
LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*
Hien	
  Luu	
   Sid	
  Anand	
  
©2013 LinkedIn Corporation. All Rights Reserved.
Our	
  mission	
  
Connect the world’s professionals to make
them more productive and successful
Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
*
>88%	
  
Fortune	
  100	
  Companies	
  	
  
use	
  LinkedIn	
  Talent	
  Soln	
  to	
  hire	
  
Company	
  Pages	
  
	
  
>2.9M	
  
Professional	
  searches	
  in	
  2012	
  
	
  
>5.7B	
  
Languages	
  
	
  
19	
  
>30M	
  
Fastest	
  growing	
  demographic:	
  
Students	
  and	
  NCGs	
  
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
*
•  Headquartered	
  in	
  Mountain	
  View,	
  Calif.,	
  with	
  offices	
  around	
  the	
  world!	
  
•  As	
  of	
  June	
  1,	
  2013,	
  LinkedIn	
  has	
  ~3,700	
  full-­‐Rme	
  employees	
  located	
  around	
  
the	
  world	
  
	
  
Source :
http://press.linkedin.com/about
Agenda
ü  Company Overview
•  Big Data @ LinkedIn
•  The Segmentation & Targeting Problem
•  Solution : LinkedIn Segmentation & Targeting Platform
•  Q & A
 
Big	
  Data	
  @	
  LinkedIn	
  
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn : Big Data Story	

©2013 LinkedIn Corporation. All Rights Reserved.
Our	
  Big	
  Data	
  Story	
  depends	
  on	
  Infrastructure!	
  
•  On-­‐line	
  Data	
  Infrastructure	
  
•  Near-­‐line	
  Data	
  Infrastructure	
  
•  Offline	
  Data	
  Infrastructure	
  
Oracle	
  or	
  
Espresso	
  
Updates	
  
Web	
  
Serving	
  
Teradata	
  
Data	
  Streams	
  
Near-­‐line	
  On-­‐line	
   Off-­‐line	
  
Big Data Story : On-line Data	

©2013 LinkedIn Corporation. All Rights Reserved.
On-­‐line	
  Data	
  Infrastructure	
  
•  Supports	
  typical	
  OLTP	
  requirements	
  	
  
•  Highly	
  concurrent	
  R/W	
  access	
  
•  TransacRonal	
  guarantees	
  
•  Back-­‐up	
  &	
  Recovery	
  
•  Supports	
  a	
  central	
  LinkedIn	
  Data	
  Principle!	
  	
  
•  “All	
  data	
  everywhere”	
  
•  All	
  OLTP	
  databases	
  need	
  to	
  provide	
  a	
  
Rme-­‐line	
  consistent	
  change	
  stream	
  
	
  
•  For	
  this,	
  we	
  developed	
  and	
  open-­‐
sourced	
  Databus!	
  
Oracle	
  or	
  
Espresso	
  
Updates	
  
Web	
  
Serving	
  
On-­‐line	
  
Big Data Story : On-line Data	

Oracle	
  or	
  
Espresso	
   Data	
  Change	
  Events	
  
Search	
  
Index	
  
Graph	
  
Index	
  
Read	
  
Replicas	
  
Updates	
  
Standar
dizaRon	
  
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
•  the profile change is applied to the Standardization service
Ø  E.g. the many forms of IBM were canonicalized for search-friendliness
•  …. and to the Search Index
Ø  Recruiters can find you immediately by new keywords
•  the connection change is applied to the Graph Index service
Ø  The user can now start receiving feed updates from his new connections
Big Data Story : On-line Data	

Databus streams also update Hadoop!
Oracle	
  or	
  
Espresso	
  
Search	
  
Index	
  
Graph	
  
Index	
  
Read	
  
Replica	
  
Updates	
  
Standar
dizaRon	
  
Data	
  Change	
  Events	
  
Big Data Story : Near-line & Off-line Data	

©2013 LinkedIn Corporation. All Rights Reserved.
2	
  Main	
  Sources	
  of	
  Data	
  @	
  LinkedIn	
  
•  User-­‐provided	
  data	
  
•  e.g.	
  Member	
  Profile	
  data	
  (e.g.	
  employment,	
  educaRon	
  history,	
  endorsements)	
  
•  Tracking	
  data	
  via	
  web	
  site	
  instrumentaRon	
  	
  
•  e.g.	
  pages	
  viewed,	
  email	
  opened/sent,	
  social	
  gestures	
  :	
  posts/likes/shares	
  
Oracle	
  or	
  
Espresso	
  
Updates	
  
Databus	
  
Web	
  
Servers	
  
Teradata	
  
The	
  
SegmentaRon	
  &	
  TargeRng	
  	
  
Problem	
  
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting Attribute types
Bhaskar Ghosh
Segmentation & Targeting	

©2013 LinkedIn Corporation. All Rights Reserved.
Step	
  1	
  :	
  Take	
  some	
  informaSon	
  about	
  users	
  
Member	
  ID	
   Join	
  Date	
   Country	
   Responded	
  to	
  
PromoSon	
  X1	
  
1	
   01/01/2013	
   FR	
   F	
  
2	
   01/02/2013	
   BE	
   F	
  
3	
   01/03/2013	
   FR	
   F	
  
4	
   02/01/2013	
   FR	
   T	
  
Step	
  2	
  :	
  Provide	
  some	
  targeSng	
  criteria	
  for	
  a	
  new	
  promoSon	
  	
  
Pick	
  members	
  where	
  
•  Join	
  Date	
  between('01/01/2013",	
  '01/31/2013")	
  and	
  	
  
•  Country="FR"	
  and	
  	
  
•  Responded	
  to	
  PromoRon	
  X1="F"	
  
	
  
à	
  Members	
  1	
  &	
  3	
  
	
  
Step	
  3	
  :	
  Target	
  them	
  for	
  a	
  different	
  email	
  campaign	
  (promoRon_X2)	
  
Segmentation & Targeting	

©2013 LinkedIn Corporation. All Rights Reserved.
Step	
  1	
  :	
  Take	
  some	
  informaSon	
  about	
  users	
  
Member	
  ID	
   Join	
  Date	
   Country	
   Responded	
  to	
  
PromoSon	
  X1	
  
1	
   01/01/2013	
   FR	
   F	
  
2	
   01/02/2013	
   BE	
   F	
  
3	
   01/03/2013	
   FR	
   F	
  
4	
   02/01/2013	
   FR	
   T	
  
Step	
  2	
  :	
  Provide	
  some	
  targeSng	
  criteria	
  for	
  a	
  new	
  promoSon	
  	
  
Pick	
  members	
  where	
  
•  Join	
  Date	
  between('01/01/2013",	
  '01/31/2013")	
  and	
  	
  
•  Country="FR"	
  and	
  	
  
•  Responded	
  to	
  PromoRon	
  X1="F"	
  
	
  
à	
  Members	
  1	
  &	
  3	
  
	
  
Step	
  3	
  :	
  Target	
  them	
  for	
  a	
  different	
  email	
  campaign	
  (promoRon_X2)	
  
Alributes	
  
Segment	
  
DefiniRon	
  
Segment	
  
Segmentation & Targeting	

©2013 LinkedIn Corporation. All Rights Reserved.
Problem	
  DefiniSon	
  
	
  
•  The	
  business	
  wants	
  to	
  launch	
  new	
  campaigns	
  omen	
  
•  The	
  business	
  wants	
  to	
  specify	
  targeRng	
  criteria	
  (segment	
  
definiRons)	
  using	
  an	
  arbitrary	
  set	
  of	
  alributes	
  
•  The	
  alributes	
  omen	
  need	
  to	
  be	
  computed	
  to	
  fulfill	
  the	
  targeRng	
  
criteria	
  
•  This	
  data	
  resides	
  on	
  Hadoop	
  or	
  TD	
  
•  The	
  business	
  is	
  most	
  comfortable	
  with	
  SQL-­‐like	
  languages	
  
	
  
	
  
 
SegmentaRon	
  &	
  TargeRng	
  SoluRon	
  
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Attribute
Serving
Engine
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute computation
~225M
PB
TB
TB
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute Portal Web Application
Attribute & Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Complex tree-like attribute predicate expressions
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
A marketing campaign is represented by a list
Conclusion
©2013 LinkedIn Corporation. All Rights Reserved.
Move at business speed and scale at LinkedIn scale
§  Segmentation & Targeting Platform
–  Self-service
–  Multiple data sources & massive data volume
–  Support complex expression evaluation in seconds
–  Attribute availability at business speed
Engineering Team
§  Jessica Ho
§  Swetha Karthik
§  Raj Rangaswamy
§  Tony Tong
§  Ajinkya Harkare
§  Hien Luu
§  Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com
©2013 LinkedIn Corporation. All Rights Reserved.

More Related Content

PDF
LinkedIn Graph Presentation
PDF
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
PDF
Data Infrastructure at LinkedIn
PDF
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
PPTX
Partner Webinar: Deliver Big Data Apps Faster With Informatica & MongoDB
PPTX
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
LinkedIn Graph Presentation
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
Partner Webinar: Deliver Big Data Apps Faster With Informatica & MongoDB
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes

What's hot (20)

PDF
How to build your own Delve: combining machine learning, big data and SharePoint
PDF
Red Hat JBoss Data Virtualization
PPTX
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
PDF
LinkedIn Data Infrastructure Slides (Version 2)
PDF
NoSQL Simplified: Schema vs. Schema-less
PDF
Benefits of Hadoop as Platform as a Service
PPT
Enterprise Mashup Infrastructure Kapow Mashup Server
PDF
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
PDF
Ironfan: Your Foundation for Flexible Big Data Infrastructure
PPT
Graph db
PPTX
How Lyft Drives Data Discovery
PPT
Linking Programming models between Grids, Web 2.0 and Multicore
PDF
Neo4j MySql MS-SQL comparison
PPT
Large scale computing
PPTX
Social shopping with semantic power
PPTX
Introduction to Microsoft HDInsight and BI Tools
PPTX
The convergence of reporting and interactive BI on Hadoop
DOCX
Key aspects of big data storage and its architecture
PPTX
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
PDF
Integrating Semantic Systems
How to build your own Delve: combining machine learning, big data and SharePoint
Red Hat JBoss Data Virtualization
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn Data Infrastructure Slides (Version 2)
NoSQL Simplified: Schema vs. Schema-less
Benefits of Hadoop as Platform as a Service
Enterprise Mashup Infrastructure Kapow Mashup Server
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Graph db
How Lyft Drives Data Discovery
Linking Programming models between Grids, Web 2.0 and Multicore
Neo4j MySql MS-SQL comparison
Large scale computing
Social shopping with semantic power
Introduction to Microsoft HDInsight and BI Tools
The convergence of reporting and interactive BI on Hadoop
Key aspects of big data storage and its architecture
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Integrating Semantic Systems
Ad

Viewers also liked (17)

PDF
Resume- William Myers FD2016.1.4
PPTX
Data Infrastructure at LinkedIn
PDF
Personal branding playbook
PDF
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
PPTX
Using Big Data for Improved Healthcare Operations and Analytics
PDF
Participatory Design: Bringing Users Into Your Process
PDF
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
PDF
Unlocking the Experts
PDF
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
PPTX
Big data ppt
PPTX
What to Upload to SlideShare
PPTX
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
PDF
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
PDF
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
PPTX
Top 5 Deep Learning and AI Stories - October 6, 2017
PPTX
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Resume- William Myers FD2016.1.4
Data Infrastructure at LinkedIn
Personal branding playbook
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Using Big Data for Improved Healthcare Operations and Analytics
Participatory Design: Bringing Users Into Your Process
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Unlocking the Experts
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Big data ppt
What to Upload to SlideShare
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
Top 5 Deep Learning and AI Stories - October 6, 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Ad

Similar to LinkedIn Segmentation & Targeting Platform: A Big Data Application (20)

PPTX
LinkedIn Member Segmentation Platform: A Big Data Application
PPTX
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
PPTX
LinkedIn Segmentation & Targeting Platform
PDF
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
PDF
Big Data Ecosystem @ LinkedIn
PPTX
Big data arch_analytics
PDF
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
PPTX
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
PPT
Linked in stream experimentation framework
PPTX
How Linkedin uses Automic for Big Data Processes
PDF
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
PDF
Open Source Data PowerPoint Presentation Slides
PDF
Open Source Data PowerPoint Presentation Slides
PDF
Data Analytics PowerPoint Presentation Slides
PDF
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
PDF
Big Data Sources PowerPoint Presentation Slides
PDF
Data Science Powerpoint Presentation Slides
PDF
PXL Data Engineering Workshop By Selligent
PPTX
LinkedIn - Relationships Matter
PPTX
Informatica big data and social media
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn Segmentation & Targeting Platform
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Big Data Ecosystem @ LinkedIn
Big data arch_analytics
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Linked in stream experimentation framework
How Linkedin uses Automic for Big Data Processes
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
Open Source Data PowerPoint Presentation Slides
Open Source Data PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Big Data Sources PowerPoint Presentation Slides
Data Science Powerpoint Presentation Slides
PXL Data Engineering Workshop By Selligent
LinkedIn - Relationships Matter
Informatica big data and social media

More from Amy W. Tang (9)

PDF
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
PDF
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
PDF
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
PDF
Building Distributed Systems Using Helix
PDF
Data Infrastructure at LinkedIn
PDF
Voldemort on Solid State Drives
PDF
Untangling Cluster Management with Helix
PDF
All Aboard the Databus
PDF
Introduction to Databus
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Building Distributed Systems Using Helix
Data Infrastructure at LinkedIn
Voldemort on Solid State Drives
Untangling Cluster Management with Helix
All Aboard the Databus
Introduction to Databus

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
August Patch Tuesday
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative study of natural language inference in Swahili using monolingua...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
August Patch Tuesday
Programs and apps: productivity, graphics, security and other tools
SOPHOS-XG Firewall Administrator PPT.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Heart disease approach using modified random forest and particle swarm optimi...
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars
TLE Review Electricity (Electricity).pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative study of natural language inference in Swahili using monolingua...

LinkedIn Segmentation & Targeting Platform: A Big Data Application

  • 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 2. About Us * Hien  Luu   Sid  Anand  
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our  mission   Connect the world’s professionals to make them more productive and successful
  • 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 5. * >88%   Fortune  100  Companies     use  LinkedIn  Talent  Soln  to  hire   Company  Pages     >2.9M   Professional  searches  in  2012     >5.7B   Languages     19   >30M   Fastest  growing  demographic:   Students  and  NCGs   The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts * •  Headquartered  in  Mountain  View,  Calif.,  with  offices  around  the  world!   •  As  of  June  1,  2013,  LinkedIn  has  ~3,700  full-­‐Rme  employees  located  around   the  world     Source : http://press.linkedin.com/about
  • 7. Agenda ü  Company Overview •  Big Data @ LinkedIn •  The Segmentation & Targeting Problem •  Solution : LinkedIn Segmentation & Targeting Platform •  Q & A
  • 8.   Big  Data  @  LinkedIn   ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our  Big  Data  Story  depends  on  Infrastructure!   •  On-­‐line  Data  Infrastructure   •  Near-­‐line  Data  Infrastructure   •  Offline  Data  Infrastructure   Oracle  or   Espresso   Updates   Web   Serving   Teradata   Data  Streams   Near-­‐line  On-­‐line   Off-­‐line  
  • 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-­‐line  Data  Infrastructure   •  Supports  typical  OLTP  requirements     •  Highly  concurrent  R/W  access   •  TransacRonal  guarantees   •  Back-­‐up  &  Recovery   •  Supports  a  central  LinkedIn  Data  Principle!     •  “All  data  everywhere”   •  All  OLTP  databases  need  to  provide  a   Rme-­‐line  consistent  change  stream     •  For  this,  we  developed  and  open-­‐ sourced  Databus!   Oracle  or   Espresso   Updates   Web   Serving   On-­‐line  
  • 11. Big Data Story : On-line Data Oracle  or   Espresso   Data  Change  Events   Search   Index   Graph   Index   Read   Replicas   Updates   Standar dizaRon   A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: •  the profile change is applied to the Standardization service Ø  E.g. the many forms of IBM were canonicalized for search-friendliness •  …. and to the Search Index Ø  Recruiters can find you immediately by new keywords •  the connection change is applied to the Graph Index service Ø  The user can now start receiving feed updates from his new connections
  • 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle  or   Espresso   Search   Index   Graph   Index   Read   Replica   Updates   Standar dizaRon   Data  Change  Events  
  • 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2  Main  Sources  of  Data  @  LinkedIn   •  User-­‐provided  data   •  e.g.  Member  Profile  data  (e.g.  employment,  educaRon  history,  endorsements)   •  Tracking  data  via  web  site  instrumentaRon     •  e.g.  pages  viewed,  email  opened/sent,  social  gestures  :  posts/likes/shares   Oracle  or   Espresso   Updates   Databus   Web   Servers   Teradata  
  • 14. The   SegmentaRon  &  TargeRng     Problem   ©2013 LinkedIn Corporation. All Rights Reserved.
  • 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  • 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step  1  :  Take  some  informaSon  about  users   Member  ID   Join  Date   Country   Responded  to   PromoSon  X1   1   01/01/2013   FR   F   2   01/02/2013   BE   F   3   01/03/2013   FR   F   4   02/01/2013   FR   T   Step  2  :  Provide  some  targeSng  criteria  for  a  new  promoSon     Pick  members  where   •  Join  Date  between('01/01/2013",  '01/31/2013")  and     •  Country="FR"  and     •  Responded  to  PromoRon  X1="F"     à  Members  1  &  3     Step  3  :  Target  them  for  a  different  email  campaign  (promoRon_X2)  
  • 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step  1  :  Take  some  informaSon  about  users   Member  ID   Join  Date   Country   Responded  to   PromoSon  X1   1   01/01/2013   FR   F   2   01/02/2013   BE   F   3   01/03/2013   FR   F   4   02/01/2013   FR   T   Step  2  :  Provide  some  targeSng  criteria  for  a  new  promoSon     Pick  members  where   •  Join  Date  between('01/01/2013",  '01/31/2013")  and     •  Country="FR"  and     •  Responded  to  PromoRon  X1="F"     à  Members  1  &  3     Step  3  :  Target  them  for  a  different  email  campaign  (promoRon_X2)   Alributes   Segment   DefiniRon   Segment  
  • 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem  DefiniSon     •  The  business  wants  to  launch  new  campaigns  omen   •  The  business  wants  to  specify  targeRng  criteria  (segment   definiRons)  using  an  arbitrary  set  of  alributes   •  The  alributes  omen  need  to  be  computed  to  fulfill  the  targeRng   criteria   •  This  data  resides  on  Hadoop  or  TD   •  The  business  is  most  comfortable  with  SQL-­‐like  languages      
  • 20.   SegmentaRon  &  TargeRng  SoluRon   ©2013 LinkedIn Corporation. All Rights Reserved.
  • 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  • 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  • 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  • 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  • 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  • 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  • 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  • 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  • 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  • 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  • 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  • 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  • 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  • 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  • 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale §  Segmentation & Targeting Platform –  Self-service –  Multiple data sources & massive data volume –  Support complex expression evaluation in seconds –  Attribute availability at business speed
  • 36. Engineering Team §  Jessica Ho §  Swetha Karthik §  Raj Rangaswamy §  Tony Tong §  Ajinkya Harkare §  Hien Luu §  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.