SlideShare a Scribd company logo
Better Together: How Graph
database enables easy data
integration with Spark and
Kafka in the Cloud
September 30th 2020
1
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Today's Speakers
Emma Liu
Product Manager
● BS in Engineering from Harvey Mudd College, MS
in Engineering Systems from MIT
● Prior work experience at Oracle and MarkLogic
● Focus - Cloud, Containers, Enterprise Infra,
Monitoring, Management, Connectors
Rayees Pasha
Product Manager
● MS in Computer Science from University of Memphis
● Prior Lead PM and ENG positions at Workday, Hitachi
and HP
● Expertise in Database Management and Big Data
Technologies
2
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
1
TigerGraph Architecture and Data
Ingestion Overview
TigerGraph and Spark Data Pipeline
TigerGraph and Kafka Data Pipeline
Today’s Outline
3
2
3
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
SYSTEM
ARCHITECTURE
OVERVIEW
4
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
The TigerGraph Difference
Feature Design Difference Benefit
Real-Time Deep-Link Querying ● Native Graph design
● C++ engine, for high performance
● Storage Architecture
● Uncovers hard-to-find patterns
● Operational, real-time
● HTAP: Transactions+Analytics
Handling Massive Scale ● Distributed DB architecture
● Massively parallel processing
● Compressed storage reduces
footprint and messaging
● Integrates all your data
● Automatic partitioning
● Elastic scaling of resource usage
In-Database Analytics ● GSQL: High-level yet
Turing-complete language
● User-extensible graph algorithm
library, runs in-DB
● ACID (OLTP) and Accumulators
(OLAP)
● Avoids transferring data
● Richer graph context
● In-DB machine learning
5 to 10+ hops deep
5
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
TigerGraph Architecture
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data Ingestion
7
Step 3
Each GPE consumes the
partial data updates,
processes it and puts it on
disk.
Loading Jobs and POST use
UPSERT semantics:
● If vertex/edge doesn't
yet exist, create it.
● If vertex/edge already
exists, update it.
● Idempotent
Step 1
Data integration through the
following ways to ingest in
user source data.
● Bulk load of data files or
a Kafka stream in CSV or
JSON format
● HTTP POSTs via REST
services (JSON)
● GSQL Insert commands
Step 2
Dispatcher takes in the data
ingestion requests in the form of
updates to the database.
1. Query IDS to get internal
IDs
2. Convert data to internal
format
3. Send data to one or more
corresponding GPEs
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data Ingestion
8
Incremental
Data
Nginx Restpp
GPE GPE GPE
Disk Disk Disk
CSV/JSON Insert/Update/Delete
Vertices and Edges
Listen to
corresponding
topic for new
messages
Acknowledge
Response
Incoming
Outgoing
Synchronize
data to disk
GSE(IDS)
ID Translation
Kafka Kafka Kafka
Server 1 Server 2 Server 3
Kafka Cluster
In-memory
copy of data
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Spark and
TigerGraph
9
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Spark + TigerGraph Data Pipeline
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Typical Spark + TigerGraph Integration
● Data Preparation and Integration (TigerGraph/Spark)
● Unsupervised Learning (TigerGraph)
● Feature Extraction for Supervised Learning (TigerGraph/Spark)
● Model Training (Spark)
● Validate and Apply Model (TigerGraph)
● Visualize and Explore Interconnected Data (TigerGraph)
11
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Spark and TigerGraph Data Pipeline
Static
Data
Sources
TigerGraph
JDBC
Driver
Streaming
Data
Sources
12
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
JDBC Driver
● Type 4 driver
● Support Read and Write bi-directional data flow to TigerGraph
● Read: Converts ResultSet to DataFrame
● Write: Load DataFrame and files to vertex/edge in TigerGraph
● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from
TigerGraph
● Open Source:
● https://github.com/tigergraph/ecosys/tree/master/tools/etl/tg-jdbc-driver
13
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Supervised ML with TigerGraph - Detecting Phone-Based Fraud
by Analyzing Network or Graph Relationship Features at China
Mobile
Download the solution brief at - https://info.tigergraph.com/MachineLearning
14
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
DEMO
15
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka and
TigerGraph
16
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka and TigerGraph Data Pipeline
Static
Data
Sources
Streaming
Data
Sources
Kafka
Loader
17
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader - Speed to Value from Real-time
Streaming Data
• Reduce Data Availability Gap and Accelerate Time to Value
• Native Integration with Real-time Streaming Data and Batch
Data
• Enables Real-time Graph Feature Updates with Streaming Data
in Machine Learning Use Cases
• Decrease Learning Curve With Familiar Syntax
• GSQL Support with Consistent Data Loading Syntax
• Maintain Separation of Control for Data Loading
• Designed with Built-in MultiGraph Support
18
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader : Three Steps
Consistent with GSQL Data Loading Steps
Step 1: Define the Data Source
Step 2: Create a Loading Job
Step 3: Run the Loading Job
19
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader High Level Architecture
● Connect to External Kafka Cluster
● User Commands Through GSQL Server
● Configuration Settings:
○ Config 1: Kakfa Cluster Configuration
○ Config 2: Topic/Partition/Offset Info
20
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
DEMO
21
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
TigerGraph Architecture + Spark + Kakfa
22
Get Started for Free
● Try TigerGraph Cloud ( tgcloud.io )
● Download TigerGraph’s Developer Edition
● Take a Test Drive - Online Demo
● Get TigerGraph Certified
● Join the Community
@TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph
23

More Related Content

PDF
GraphQL Advanced
PDF
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
PPTX
PostGreSQL Performance Tuning
PPTX
Your Roadmap for An Enterprise Graph Strategy
PDF
MySQL innoDB split and merge pages
PDF
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
POTX
Content Management with MongoDB by Mark Helmstetter
PDF
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
GraphQL Advanced
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
PostGreSQL Performance Tuning
Your Roadmap for An Enterprise Graph Strategy
MySQL innoDB split and merge pages
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Content Management with MongoDB by Mark Helmstetter
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

What's hot (20)

PPT
Introduction to MongoDB
PDF
Big Query - Utilizing Google Data Warehouse for Media Analytics
PDF
Introduction to MongoDB
PPTX
MongoDB
PPTX
Top 10 Cypher Tuning Tips & Tricks
PPTX
Getting started with postgresql
PPTX
MongoDB presentation
PDF
Workshop - Neo4j Graph Data Science
PDF
ksqlDB - Stream Processing simplified!
PDF
How to Use JSON in MySQL Wrong
PDF
Postgresql database administration volume 1
PDF
Predictive Analytics with Airflow and PySpark
PDF
A Deep Dive into JSON-LD and Hydra
PPTX
Migrating from RDBMS to MongoDB
PDF
ntroducing to the Power of Graph Technology
KEY
PostgreSQL
PDF
Solving PostgreSQL wicked problems
PDF
MyRocks Deep Dive
PDF
A Technical Introduction to WiredTiger
PDF
Dataverse opportunities
 
Introduction to MongoDB
Big Query - Utilizing Google Data Warehouse for Media Analytics
Introduction to MongoDB
MongoDB
Top 10 Cypher Tuning Tips & Tricks
Getting started with postgresql
MongoDB presentation
Workshop - Neo4j Graph Data Science
ksqlDB - Stream Processing simplified!
How to Use JSON in MySQL Wrong
Postgresql database administration volume 1
Predictive Analytics with Airflow and PySpark
A Deep Dive into JSON-LD and Hydra
Migrating from RDBMS to MongoDB
ntroducing to the Power of Graph Technology
PostgreSQL
Solving PostgreSQL wicked problems
MyRocks Deep Dive
A Technical Introduction to WiredTiger
Dataverse opportunities
 
Ad

Similar to Better Together: How Graph database enables easy data integration with Spark and Kafka in the Cloud (20)

PPTX
Comparing three data ingestion approaches where Apache Kafka integrates with ...
PDF
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
PDF
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
PDF
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
PPTX
Tiger graph 2021 corporate overview [read only]
PDF
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
PDF
Graph Databases and Machine Learning | November 2018
PDF
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
PPTX
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
PPTX
Big data analytics_7_giants_public_24_sep_2013
PDF
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
PDF
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
PDF
Intel realtime analytics_spark
PDF
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
PDF
Apache Spark Presentation good for big data
PDF
Dev Ops Training
PDF
TigerGraph UI Toolkits Financial Crimes
PDF
Ingesting streaming data into Graph Database
Comparing three data ingestion approaches where Apache Kafka integrates with ...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Tiger graph 2021 corporate overview [read only]
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Databases and Machine Learning | November 2018
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
Big data analytics_7_giants_public_24_sep_2013
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Intel realtime analytics_spark
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
Apache Spark Presentation good for big data
Dev Ops Training
TigerGraph UI Toolkits Financial Crimes
Ingesting streaming data into Graph Database
Ad

More from TigerGraph (20)

PDF
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
PDF
Building an accurate understanding of consumers based on real-world signals
PDF
Care Intervention Assistant - Omaha Clinical Data Information System
PDF
Correspondent Banking Networks
PDF
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
PDF
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
PDF
Fraud Detection and Compliance with Graph Learning
PDF
Fraudulent credit card cash-out detection On Graphs
PDF
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
PDF
Customer Experience Management
PDF
Graph+AI for Fin. Services
PDF
Davraz - A graph visualization and exploration software.
PDF
Plume - A Code Property Graph Extraction and Analysis Library
PDF
TigerGraph.js
PDF
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
PDF
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
PDF
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
PDF
Recommendation Engine with In-Database Machine Learning
PDF
Supply Chain and Logistics Management with Graph & AI
PDF
The key to creating a Golden Thread: the power of Graph Databases for Entity ...
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
Building an accurate understanding of consumers based on real-world signals
Care Intervention Assistant - Omaha Clinical Data Information System
Correspondent Banking Networks
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
Fraud Detection and Compliance with Graph Learning
Fraudulent credit card cash-out detection On Graphs
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
Customer Experience Management
Graph+AI for Fin. Services
Davraz - A graph visualization and exploration software.
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph.js
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
Recommendation Engine with In-Database Machine Learning
Supply Chain and Logistics Management with Graph & AI
The key to creating a Golden Thread: the power of Graph Databases for Entity ...

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PPTX
Challenges and opportunities in feeding a growing population
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Data Science Trends & Career Guide---ppt
PDF
Foundation of Data Science unit number two notes
PDF
Report The-State-of-AIOps 20232032 3.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
Challenges and opportunities in feeding a growing population
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction-to-Cloud-ComputingFinal.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
climate analysis of Dhaka ,Banglades.pptx
1_Introduction to advance data techniques.pptx
Clinical guidelines as a resource for EBP(1).pdf
Data Science Trends & Career Guide---ppt
Foundation of Data Science unit number two notes
Report The-State-of-AIOps 20232032 3.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx

Better Together: How Graph database enables easy data integration with Spark and Kafka in the Cloud

  • 1. Better Together: How Graph database enables easy data integration with Spark and Kafka in the Cloud September 30th 2020 1
  • 2. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Today's Speakers Emma Liu Product Manager ● BS in Engineering from Harvey Mudd College, MS in Engineering Systems from MIT ● Prior work experience at Oracle and MarkLogic ● Focus - Cloud, Containers, Enterprise Infra, Monitoring, Management, Connectors Rayees Pasha Product Manager ● MS in Computer Science from University of Memphis ● Prior Lead PM and ENG positions at Workday, Hitachi and HP ● Expertise in Database Management and Big Data Technologies 2
  • 3. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | 1 TigerGraph Architecture and Data Ingestion Overview TigerGraph and Spark Data Pipeline TigerGraph and Kafka Data Pipeline Today’s Outline 3 2 3
  • 4. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | SYSTEM ARCHITECTURE OVERVIEW 4
  • 5. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | The TigerGraph Difference Feature Design Difference Benefit Real-Time Deep-Link Querying ● Native Graph design ● C++ engine, for high performance ● Storage Architecture ● Uncovers hard-to-find patterns ● Operational, real-time ● HTAP: Transactions+Analytics Handling Massive Scale ● Distributed DB architecture ● Massively parallel processing ● Compressed storage reduces footprint and messaging ● Integrates all your data ● Automatic partitioning ● Elastic scaling of resource usage In-Database Analytics ● GSQL: High-level yet Turing-complete language ● User-extensible graph algorithm library, runs in-DB ● ACID (OLTP) and Accumulators (OLAP) ● Avoids transferring data ● Richer graph context ● In-DB machine learning 5 to 10+ hops deep 5
  • 6. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | TigerGraph Architecture
  • 7. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Data Ingestion 7 Step 3 Each GPE consumes the partial data updates, processes it and puts it on disk. Loading Jobs and POST use UPSERT semantics: ● If vertex/edge doesn't yet exist, create it. ● If vertex/edge already exists, update it. ● Idempotent Step 1 Data integration through the following ways to ingest in user source data. ● Bulk load of data files or a Kafka stream in CSV or JSON format ● HTTP POSTs via REST services (JSON) ● GSQL Insert commands Step 2 Dispatcher takes in the data ingestion requests in the form of updates to the database. 1. Query IDS to get internal IDs 2. Convert data to internal format 3. Send data to one or more corresponding GPEs
  • 8. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Data Ingestion 8 Incremental Data Nginx Restpp GPE GPE GPE Disk Disk Disk CSV/JSON Insert/Update/Delete Vertices and Edges Listen to corresponding topic for new messages Acknowledge Response Incoming Outgoing Synchronize data to disk GSE(IDS) ID Translation Kafka Kafka Kafka Server 1 Server 2 Server 3 Kafka Cluster In-memory copy of data
  • 9. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Spark and TigerGraph 9
  • 10. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Spark + TigerGraph Data Pipeline
  • 11. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Typical Spark + TigerGraph Integration ● Data Preparation and Integration (TigerGraph/Spark) ● Unsupervised Learning (TigerGraph) ● Feature Extraction for Supervised Learning (TigerGraph/Spark) ● Model Training (Spark) ● Validate and Apply Model (TigerGraph) ● Visualize and Explore Interconnected Data (TigerGraph) 11
  • 12. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Spark and TigerGraph Data Pipeline Static Data Sources TigerGraph JDBC Driver Streaming Data Sources 12
  • 13. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | JDBC Driver ● Type 4 driver ● Support Read and Write bi-directional data flow to TigerGraph ● Read: Converts ResultSet to DataFrame ● Write: Load DataFrame and files to vertex/edge in TigerGraph ● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from TigerGraph ● Open Source: ● https://github.com/tigergraph/ecosys/tree/master/tools/etl/tg-jdbc-driver 13
  • 14. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Supervised ML with TigerGraph - Detecting Phone-Based Fraud by Analyzing Network or Graph Relationship Features at China Mobile Download the solution brief at - https://info.tigergraph.com/MachineLearning 14
  • 15. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | DEMO 15
  • 16. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Kafka and TigerGraph 16
  • 17. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Kafka and TigerGraph Data Pipeline Static Data Sources Streaming Data Sources Kafka Loader 17
  • 18. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Kafka Loader - Speed to Value from Real-time Streaming Data • Reduce Data Availability Gap and Accelerate Time to Value • Native Integration with Real-time Streaming Data and Batch Data • Enables Real-time Graph Feature Updates with Streaming Data in Machine Learning Use Cases • Decrease Learning Curve With Familiar Syntax • GSQL Support with Consistent Data Loading Syntax • Maintain Separation of Control for Data Loading • Designed with Built-in MultiGraph Support 18
  • 19. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Kafka Loader : Three Steps Consistent with GSQL Data Loading Steps Step 1: Define the Data Source Step 2: Create a Loading Job Step 3: Run the Loading Job 19
  • 20. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | Kafka Loader High Level Architecture ● Connect to External Kafka Cluster ● User Commands Through GSQL Server ● Configuration Settings: ○ Config 1: Kakfa Cluster Configuration ○ Config 2: Topic/Partition/Offset Info 20
  • 21. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | DEMO 21
  • 22. | GRAPHAIWORLD.COM | #GRAPHAIWORLD | TigerGraph Architecture + Spark + Kakfa 22
  • 23. Get Started for Free ● Try TigerGraph Cloud ( tgcloud.io ) ● Download TigerGraph’s Developer Edition ● Take a Test Drive - Online Demo ● Get TigerGraph Certified ● Join the Community @TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph 23