SlideShare a Scribd company logo
Congressional PageRank:
Graph Analytics Of US Congress
William Lyon
Graph Day - Austin, TX
January 2016
About me
Software Developer @Neo4j
william.lyon@neo4j.com
@lyonwj
lyonwj.com
William Lyon
Agenda
• Brief intro to Neo4j graph database
• Modeling US Congress as a graph
• Exploring the 114th Congress
• Finding influential legislators
Neo4j – Key Features
Native Graph Storage

Ensures data consistency and
performance
Native Graph Processing

Millions of hops per second, in real time
“Whiteboard Friendly” Data Modeling

Model data as it naturally occurs
High Data Integrity

Fully ACID transactions
Powerful, Expressive Query
Language

Requires 10x to 100x less code than
SQL
Scalability and High Availability

Vertical and horizontal scaling
optimized for graphs
Built-in ETL

Seamless import from other databases
Integration

Drivers and APIs for popular languages
MATCH

(A)
Property Graph Model
The Whiteboard Model Is the Physical Model
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since: 

Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON
Cypher Query Language
Cypher: Powerful and Expressive Query
Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate, 

count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how
many people they manage, 

up to 3 levels down
Cypher Query
SQL Query
http://www.opencypher.org/
Getting Data into Neo4j
Cypher-Based “LOAD CSV” Capability
• Transactional (ACID) writes
• Initial and incremental loads of up to 

10 million nodes and relationships
Command-Line Bulk Loader
neo4j-import
• For initial database population
• For loads with 10B+ records
• Up to 1M records per second
4.58 million things
and their relationships…
Loads in 100 seconds!
Neo4j
Graph Database
• Property graph datamodel
• Nodes and relationships
• Native graph processing
• Cypher query language
Graphing US Congress
https://github.com/legis-graph/legis-graph
https://github.com/legis-graph/legis-graph
LOAD CSV WITH HEADERS
FROM “file:///legislators.csv” AS line
MERGE (l:Legislator (thomasID: line.thomasID})
SET l = line
MERGE (s:State {code:line.state})<-[:REPRESENTS]-(l)
…
US Congress
https://github.com/legis-graph/legis-graph
What Legislators represent Texas?
MATCH (s:State {code: "TX"})<-[:REPRESENTS]-(l:Legislator)
RETURN l,s;
…include congressional body and party
MATCH (s:State {code: "TX"})<-[:REPRESENTS]-(l:Legislator)
MATCH (p:Party)<-[:IS_MEMBER_OF]-(l)-[:ELECTED_TO]->(b:Body)
RETURN b,l,s,p;
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
How to find influential legislators?
Bill Sponsorship
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Bill Cosponsorship
Degree centrality
Bill Cosponsorship
• Cosponsors are
“influenced by” bill
sponsors
• Add INFLUENCED_BY
relationships
Betweenness centrality
The number of times a node acts as a bridge
along the shortest path between two other nodes.
https://en.wikipedia.org/wiki/Betweenness_centrality
Congressional PageRank: Graph Analytics of US Congress With Neo4j
image credit: https://en.wikipedia.org/wiki/PageRank
image credit: https://en.wikipedia.org/wiki/PageRank
?
PageRank
Cypher approximation
UNWIND range(1,10) AS round
MATCH (l:Legislator)
WHERE rand() < 0.1
MATCH (l:Legislator)-[:INFLUENCED_BY]->(o:Legislator)
SET o.rank = coalesce(o.rank,0) + 1;
http://neo4j.com/blog/using-neo4j-hr-analytics/
Neo4j server extensions with Java
Neo4j server extensions with Java
curl http://localhost:7474/service/v1/pagerank/Person/KNOWS
PageRank
Graph processing server extension
https://github.com/maxdemarzi/graph_processing
curl http://localhost:7474/service/v1/pagerank/Person/KNOWS
PageRank
neo4j-noderank
https://github.com/graphaware/neo4j-noderank
Two issues
• Local vs global
• Iterative algorithms and graph complexity
Local vs global
Local Global
Local vs global
Local Global
Offline / batchOLTP / realtime
For iterative algorithms like PageRank, it’s all about complexity of the graph
Lots of paths. Lots of iterations
Graph complexity
PageRank
Graph global!
PageRank
Graph global!
Iterative!
• Efficient in-memory data processing and
machine learning platform
• Graph analytics with GraphX
• In-memory message passing algorithm
Apache Spark is a fast and general engine for large-scale data processing.
http://spark.apache.org/
Congressional PageRank: Graph Analytics of US Congress With Neo4j
PageRank
Spark with Neo4j - Scala
https://github.com/AnormCypher/AnormCypher
import org.anormcypher._
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._
val total =    100000000
val batch = total/1000000
val links = sc.range(0,batch).repartition(batch).mapPartitionsWithIndex( (i,p) => {
   val dbConn = Neo4jREST("localhost", 9474, "/db/data/", "neo4j", "test")
   val q = "MATCH (l1:Legislator)-[:INFLUENCED_BY]->(l2:Legislator) RETURN id(l1)
as from, id(l2) as to skip {skip} limit 1000000"
   p.flatMap( skip => {
      Cypher(q).on("skip"->skip*1000000).apply()(dbConn).map(row =>
            (row[Int]("from").toLong,row[Int]("to").toLong)
        )
   })
})
links.cache
links.count
val edges = links.map( l => Edge(l._1,l._2, None))
val g = Graph.fromEdges(edges,"none")
val v = PageRank.run(g, 5).vertices
Extract subgraph. Run PageRank using Spark GraphX.
val res = v.repartition(total/100000).mapPartitions( part => {
  val localConn = Neo4jREST("localhost", 9474, "/db/data/", "neo4j", "test")
  val updateStmt = Cypher("UNWIND {updates} as update MATCH (p) where id(p) =
update.id SET p.pagerank = update.rank")
  val updates = part.map( v => Map("id"->v._1.toLong, "rank" -> v._2.toDouble))
  val count = updateStmt.on("updates"->updates).execute()(localConn)
  Iterator(part.size)
})
Write back to graph
PageRank
Mazerunner
http://www.kennybastani.com/2014/11/using-apache-spark-and-neo4j-for-big.html
• Enables two-way ETL between
Spark and Neo4j
• Run GraphX jobs from data in
Neo4j
• Write results back to Neo4j
PageRank
Mazerunner
http://www.kennybastani.com/2014/11/using-apache-spark-and-neo4j-for-big.html
• Enables two-way ETL between
Spark and Neo4j
• Run GraphX jobs from data in
Neo4j
• Write results back to Neo4j
• Support for:
• PageRank
• Closeness Centrality
• Betweenness Centrality
• Triangle Counting
• Connected Components
• Strongly Connected Components
https://github.com/neo4j-contrib/neo4j-mazerunner
curl http://localhost:7474/service/mazerunner/analysis/pagerank/INFLUENCED_BY
• Cosponsors are
“influenced by” bill
sponsors
• Add INFLUENCED_BY
relationships
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Who are the influential legislators?
Who are the influential legislators?
Influential legislators by topic
Influential legislators by topic
graphdatabases.com
http://graphgist.neo4j.com/
http://portal.graphgist.org/challenge/index.html
Links
• http://www.lyonwj.com/2015/09/20/legis-graph-congressional-data-
using-neo4j/
• http://www.lyonwj.com/2015/10/11/congressional-pagerank/
• https://github.com/legis-graph/legis-graph
• https://github.com/neo4j-contrib/neo4j-mazerunner
• http://www.kennybastani.com/2014/11/graph-analytics-docker-
spark-neo4j.html
• http://www.kennybastani.com/2015/03/spark-neo4j-tutorial-
docker.html

More Related Content

PDF
Signals from outer space
PDF
Power of Polyglot Search
PDF
Machine Learning and GraphX
PDF
GraphX: Graph analytics for insights about developer communities
PDF
An excursion into Graph Analytics with Apache Spark GraphX
PPT
Big Graph Analytics on Neo4j with Apache Spark
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
PPTX
Graph Analytics: Graph Algorithms Inside Neo4j
Signals from outer space
Power of Polyglot Search
Machine Learning and GraphX
GraphX: Graph analytics for insights about developer communities
An excursion into Graph Analytics with Apache Spark GraphX
Big Graph Analytics on Neo4j with Apache Spark
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j

What's hot (20)

PPTX
Apache Spark GraphX highlights.
PPTX
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
PPTX
Gephi, Graphx, and Giraph
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
PDF
Interpreting Relational Schema to Graphs
PPTX
Analyzing Data With Python
PDF
Spark Summit 2015 keynote: Making Big Data Simple with Spark
PDF
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
PPTX
LD4KD 2015 - Demos and tools
PDF
Improve ML Predictions using Connected Feature Extraction
PDF
Building a Graph-based Analytics Platform
PDF
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
PDF
GraphFrames: DataFrame-based graphs for Apache® Spark™
PPTX
R at Microsoft
PPTX
Strata sf - Amundsen presentation
PPTX
R at Microsoft (useR! 2016)
PPT
Benchmarking graph databases on the problem of community detection
PDF
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
PPTX
Spark for Recommender Systems
PDF
Spark Meetup @ Netflix, 05/19/2015
Apache Spark GraphX highlights.
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Gephi, Graphx, and Giraph
Graphs are everywhere! Distributed graph computing with Spark GraphX
Interpreting Relational Schema to Graphs
Analyzing Data With Python
Spark Summit 2015 keynote: Making Big Data Simple with Spark
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
LD4KD 2015 - Demos and tools
Improve ML Predictions using Connected Feature Extraction
Building a Graph-based Analytics Platform
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphFrames: DataFrame-based graphs for Apache® Spark™
R at Microsoft
Strata sf - Amundsen presentation
R at Microsoft (useR! 2016)
Benchmarking graph databases on the problem of community detection
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
Spark for Recommender Systems
Spark Meetup @ Netflix, 05/19/2015
Ad

Viewers also liked (7)

PDF
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
PDF
Turning a Thousand or so Words into a Map
PDF
Finding Insights In Connected Data: Using Graph Databases In Journalism
PDF
Natural Language Processing and Graph Databases in Lumify
PDF
Natural Language Processing Crash Course
PDF
Natural Language Processing with Graph Databases and Neo4j
PPTX
Neo4j - graph database for recommendations
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
Turning a Thousand or so Words into a Map
Finding Insights In Connected Data: Using Graph Databases In Journalism
Natural Language Processing and Graph Databases in Lumify
Natural Language Processing Crash Course
Natural Language Processing with Graph Databases and Neo4j
Neo4j - graph database for recommendations
Ad

Similar to Congressional PageRank: Graph Analytics of US Congress With Neo4j (20)

PDF
Spark Community Update - Spark Summit San Francisco 2015
PPTX
The openCypher Project - An Open Graph Query Language
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PPTX
Introduction to Neo4j and .Net
PDF
Beyond SQL: Speeding up Spark with DataFrames
PDF
20170126 big data processing
PDF
Data Source API in Spark
PDF
[Webinar] Introduction to Cypher
PDF
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
PDF
Osd ctw spark
PDF
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
PDF
Xia Zhu – Intel at MLconf ATL
PDF
Cypher and apache spark multiple graphs and more in open cypher
PPTX
Relational to Graph - Import
PDF
Intro to Spark and Spark SQL
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
PDF
TinkerPop: a story of graphs, DBs, and graph DBs
PDF
Informatica slides
PDF
Dev Ops Training
PPTX
Large scale, interactive ad-hoc queries over different datastores with Apache...
Spark Community Update - Spark Summit San Francisco 2015
The openCypher Project - An Open Graph Query Language
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Introduction to Neo4j and .Net
Beyond SQL: Speeding up Spark with DataFrames
20170126 big data processing
Data Source API in Spark
[Webinar] Introduction to Cypher
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Osd ctw spark
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Xia Zhu – Intel at MLconf ATL
Cypher and apache spark multiple graphs and more in open cypher
Relational to Graph - Import
Intro to Spark and Spark SQL
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
TinkerPop: a story of graphs, DBs, and graph DBs
Informatica slides
Dev Ops Training
Large scale, interactive ad-hoc queries over different datastores with Apache...

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
August Patch Tuesday
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
August Patch Tuesday
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative study of natural language inference in Swahili using monolingua...
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectroscopy.pptx food analysis technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
OMC Textile Division Presentation 2021.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Heart disease approach using modified random forest and particle swarm optimi...
TLE Review Electricity (Electricity).pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Programs and apps: productivity, graphics, security and other tools
Mushroom cultivation and it's methods.pdf
Encapsulation_ Review paper, used for researhc scholars

Congressional PageRank: Graph Analytics of US Congress With Neo4j