SlideShare a Scribd company logo
Learning to Classify Users in Online
Interaction Networks
Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris
Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
ICCSS 2015, June 10, 2015, Helsinki, Finland
User Classification
#2
Twitter Handle Labels
@nytimes usa, press,
new york
@HuffPostBiz finance
@BBCBreaking press,
journalist, tv
@StKonrath journalist
Examples from SNOW 2014 dataset
User Classification in (and outside) OSNs
#3
OSN
online activities
log filesAPIs
Behaviour
Observation
Profiling/Classification
Network-based User Classification
• People with similar interests tend to connect
(homophily)
• Knowing about one’s connections
could reveal information
about them
• Knowing about
the whole network
structure could reveal
even more…
#4
Related Work: User Classification
Graph-based semi-supervised learning:
• Label propagation (Zhu and Ghahramani, 2002)
• Local and global consistency (Zhou et al., 2004)
• Empirical evaluation of many graph kernels (Fouss et al., 2012)
Other approaches to user classification:
• Hybrid feature engineering for inferring user behaviors
(Pennacchiotti et al., 2011 , Wagner et al., 2013)
• Crowdsourcing Twitter list keywords for popular users
(Ghosh et al., 2012)
• Content-based, graph-regularized NMF for spammer detection
(Hu et al., 2013)
#5
Related Work: Graph Feature Extraction
First attempts at using community detection:
• EdgeCluster: Edge centric k-means (Tang and Liu, 2009)
• MROC: Binary tree community hierarchy (Wang et al., 2013)
Low-rank matrix representation methods:
• Laplacian Eigenmaps: k eigenvectors of the graph Laplacian
(Belkin and Niyogi, 2003 , Tang and Liu, 2011)
• Random-Walk Modularity Maximization: Does not suffer from
the resolution limit of ModMax (Devooght et al., 2014)
• Deepwalk: Deep representation learning (Perozzi et al., 2014)
#6
Overview of Framework
#7
Online social interactions
(retweets, mentions, etc.)
Social interaction
user graph
ARCTE
Partial/Sparse
Annotation
Unsupervised graph
feature representation
Supervised graph
feature representation
Feature Weighting
User Label
Learning
Classified Users
Network Features using ARCTE
• Based on user-centric community detection.
• We extract for each user, two types of user-centric
communities.
• Base user-centric community: 𝑐 𝑣 = 𝑁(𝑣) ∪ 𝑣
• Extended user-centric community: Consider a vector 𝑝 𝑣 that
contains similarity values among the seed user 𝑣 and all the
rest of the users.
– By truncating appropriately, we can keep a community of the most
similar users to the seed 𝑣.
– We keep the fewest possible users such that we still include the seed
user’s direct neighbors.
• Denote the set of communities detected by 𝐶. We form the
feature matrix 𝑋 as follows:
𝑥 𝑣𝑖 =
1, 𝑖𝑓𝑣 ∈ 𝑐𝑖
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
, ∀𝑐𝑖 ∈ 𝐶
#8
ARCTE: Toy Example
#9
Fast Approximate User-centric PageRank
• Given a seed user 𝑣, we calculate the user-centric PageRank
vector (i.e. stationary distribution with probability 1 at 𝑣).
• Localized, sparse vector; i.e. we neither propagate nor store
trivial values.
• Instead of approximating the PageRank vector, we
approximate cumulative PageRank differences. Better
approximation for fewer iterations.
• We alternate between two update rules:
– Cumulative PR diff: 𝑝(𝑡+1) = 𝑝(𝑡) + 1 − 𝜌 𝑟(𝑡−1) 𝑊𝑢
(instead of PR: 𝑝(𝑡+1) = 𝑝(𝑡) + 𝑟(𝑡) 𝐼 𝑢, (Andersen et al., 2006))
– Residual distribution: 𝑟(𝑡+1) = 𝑟(𝑡) − 𝑟(𝑡) 𝐼 𝑢 + (1 − 𝜌)𝑟(𝑡) 𝑊𝑢
where 𝜌: Restart probability and
𝑊𝑢 the 𝑢-th row of 𝑊 = 𝐷−1 𝐴 and 𝐼 𝑢 the 𝑢-th row of 𝐼
• Finally, we divide each element of 𝑝 by its degree in order to
get approximate, user-centric, regularized commute-times.
#10
Community Weighting
• We perform a supervised community weighting step to
boost the importance of highly predictive communities.
• For each community we calculate a weight:
𝑤 𝑑 = 𝜒2 𝑖 × 𝑖𝑣𝑓(𝑖)
• The first factor is based on supervised chi-squared weighting
that quantifies the correlation among all feature-label pairs.
– PSNR aggregation across labels: 𝜒2
𝑖 =
max 𝜒
2
𝑖,𝑙 −min( 𝜒2 𝑖,𝑙 )
𝑤𝑖𝑡ℎ𝑖𝑛−𝑙𝑎𝑏𝑒𝑙−𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
• The second factor is unsupervised inverse vertex frequency.
– Consider idf with vertices as terms and communities as documents.
• We multiply each column of 𝑋 with the corresponding weight.
#11
Evaluation: Dataset Description
#12
Datasets Labels Vertices Vertex Type Edges Edge Type
SNOW2014 Graph
(Papadopoulos et al., 2014)
90 533,874 Twitter
Account
949,661 Mentions +
Retweets
IRMV-PoliticsUK
(Greene & Cunningham, 2013)
5 419 Twitter
Account
11,349 Mentions +
Retweets
ASU-YouTube
(Mislove et al., 2007)
47 1,134,890 YouTube
Channel
2,987,624 Subscriptions
ASU-Flickr
(Tang and Liu, 2009)
195 80,513 Flickr Account 5,899,882 Contacts
Ground truth generation:
• SNOW2014 Graph: Twitter list aggregation & post-processing
• IRMV-PoliticsUK: Manual annotation
• ASU-YouTube: User membership to group
• ASU-Flickr: User subscription to interest group
Evaluation: SNOW 2014 dataset
#13
SNOW2014 Graph (534K, 950K): Twitter mentions + retweets
ground truth based on Twitter list processing
Evaluation: Insight Politics UK
#14
Insight-Multiview-PoliticsUK (419, 11K): mentions + retweets
ground truth based on manual annotation
Evaluation: ASU-YouTube
#15
ASU-YouTube (1.1M, 3M): YouTube subscriptions
ground truth based on membership to groups
Evaluation: ASU-Flickr
#16
ASU-Flickr (80K, 5.9M): Flickr contacts
ground truth based on membership to Flickr groups
Evaluation: Community Weighting
#17
Conclusion
• Key ideas:
– new user feature representation based on user-centric
communities
– community weighting based on sparse annotations
– consistently good performance both on interaction
(mention/retweet) and affiliation (follow/subscribe)
graphs
• Future Work:
– integration of additional signals (content)
– investigating feasibility on other classification problems,
e.g. spammer detection
#18
Thank you!
• Resources:
Slides: http://www.slideshare.net/sympapadopoulos/learning-to-classify-
users-in-online-interaction-networks
Code: https://github.com/MKLab-ITI/reveal-user-classification
https://github.com/MKLab-ITI/reveal-user-annotation
• Get in touch:
@sympapadopoulos / papadop@iti.gr
@georgios_rizos / georgerizos@iti.gr
#19
References (1/3)
• Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction
and data representation. Neural computation, 15(6), 1373-1396.
• Tang, L., & Liu, H. (2011). Leveraging social media networks for classification. Data
Mining and Knowledge Discovery, 23(3), 447-478.
• Devooght, R., Mantrach, A., Kivimäki, I., Bersini, H., Jaimes, A., & Saerens, M.
(2014, April). Random walks based modularity: application to semi-supervised
learning. In Proceedings of the 23rd international conference on World wide web
(pp. 213-224). International World Wide Web Conferences Steering Committee.
• Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of
social representations. In Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 701-710). ACM.
• Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based
on sparse social dimensions. In Proceedings of the 18th ACM conference on
Information and knowledge management (pp. 1107-1116). ACM.
• Wang, X., Tang, L., Liu, H., & Wang, L. (2013). Learning with multi-resolution
overlapping communities. Knowledge and information systems, 36(2), 517-535.
#20
References (2/3)
• Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label
propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University.
• Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and
global consistency. Advances in neural information processing systems, 16(16), 321-328.
• Fouss, F., Francoisse, K., Yen, L., Pirotte, A., & Saerens, M. (2012). An experimental
investigation of kernels on graphs for collaborative recommendation and semisupervised
classification. Neural Networks, 31, 53-72.
• Pennacchiotti, M., & Popescu, A. M. (2011, August). Democrats, republicans and starbucks
afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining (pp. 430-438). ACM.
• Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., & Gummadi, K. (2012, August). Cognos:
crowdsourcing search for topic experts in microblogs. In Proceedings of the 35th
international ACM SIGIR conference on Research and development in information retrieval
(pp. 575-590). ACM.
• Hu, X., Tang, J., Zhang, Y., & Liu, H. (2013, August). Social spammer detection in
microblogging. In Proceedings of the Twenty-Third international joint conference on Artificial
Intelligence (pp. 2633-2639). AAAI Press.
• Wagner, C., Asur, S., & Hailpern, J. (2013, September). Religious politicians and creative
photographers: Automatic user categorization in twitter. In Social Computing (SocialCom),
2013 International Conference on (pp. 303-310). IEEE.
#21
References (3/3)
• Andersen, R., Chung, F., & Lang, K. (2006, October). Local graph
partitioning using pagerank vectors. In Foundations of Computer Science,
2006. FOCS'06. 47th Annual IEEE Symposium on (pp. 475-486). IEEE.
• Papadopoulos, S., Corney, D., & Aiello, L. M. (2014). SNOW 2014 Data
Challenge: Assessing the Performance of News Topic Detection Methods
in Social Media. In SNOW-DC@ WWW (pp. 1-8).
• Greene, D., & Cunningham, P. (2013, May). Producing a unified graph
representation from multiple social network views. In Proceedings of the
5th Annual ACM Web Science Conference (pp. 118-121). ACM.
• Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B.
(2007, October). Measurement and analysis of online social networks. In
Proceedings of the 7th ACM SIGCOMM conference on Internet
measurement (pp. 29-42). ACM.
• Tang, L., & Liu, H. (2009, June). Relational learning via latent social
dimensions. In Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 817-826). ACM.
#22
Auxiliary Slides
#23
Classifying Users using Network Structure
• User-centric community detection to the problem
of graph-based user classification. We name our
approach ARCTE.
• Improved approximate, user-centric PageRank
calculation for better local graph exploration.
• Supervised community weighting step that boosts
the importance of highly predictive communities in
the feature representation.
• Extensive comparative study of numerous state-of-
the-art network feature extraction methods on
several social interaction datasets.
#24

More Related Content

PPTX
Social Network Visualization 101
PDF
Graph-based Analysis and Opinion Mining in Social Network
PDF
Exploring Social Media with NodeXL
PPTX
Recommending Tags with a Model of Human Categorization
PPTX
12 Network Experiments and Interventions: Studying Information Diffusion and ...
PPTX
Node XL - features and demo
PPTX
Social Network Analysis
PDF
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Social Network Visualization 101
Graph-based Analysis and Opinion Mining in Social Network
Exploring Social Media with NodeXL
Recommending Tags with a Model of Human Categorization
12 Network Experiments and Interventions: Studying Information Diffusion and ...
Node XL - features and demo
Social Network Analysis
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...

What's hot (19)

PDF
05 Communities in Network
PPT
Social network analysis course 2010 - 2011
PDF
03 Ego Network Analysis
PPTX
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
PPTX
Social network analysis
PPTX
04 Network Data Collection
PPTX
Social Network Analysis (SNA) 2018
PDF
Oxford Digital Humanities Summer School
PPTX
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
PDF
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)
PPTX
Social Network Analysis Introduction including Data Structure Graph overview.
PPT
01 Introduction to Networks Methods and Measures
PDF
Social network analysis intro part I
PPT
The Basics of Social Network Analysis
PPT
How to conduct a social network analysis: A tool for empowering teams and wor...
PDF
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
PPT
Introduction to Social Network Analysis
PPTX
A comparative study of social network analysis tools
PPTX
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
05 Communities in Network
Social network analysis course 2010 - 2011
03 Ego Network Analysis
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
Social network analysis
04 Network Data Collection
Social Network Analysis (SNA) 2018
Oxford Digital Humanities Summer School
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)
Social Network Analysis Introduction including Data Structure Graph overview.
01 Introduction to Networks Methods and Measures
Social network analysis intro part I
The Basics of Social Network Analysis
How to conduct a social network analysis: A tool for empowering teams and wor...
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
Introduction to Social Network Analysis
A comparative study of social network analysis tools
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Ad

Viewers also liked (7)

PDF
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
PPT
SNOW 2014 Data Challenge
PDF
Preprocessing of Web Log Data for Web Usage Mining
PPT
A survey on web usage mining techniques
ODP
Personal Web Usage Mining
PPT
Web Usage Pattern
PDF
Customer Clustering For Retail Marketing
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
SNOW 2014 Data Challenge
Preprocessing of Web Log Data for Web Usage Mining
A survey on web usage mining techniques
Personal Web Usage Mining
Web Usage Pattern
Customer Clustering For Retail Marketing
Ad

Similar to Learning to Classify Users in Online Interaction Networks (20)

PPTX
Enrique RCODI presentation symposium 2017
PDF
SocialCom09-tutorial.pdf
PPTX
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
PDF
Current trends of opinion mining and sentiment analysis in social networks
PDF
Mapping big data science
PPT
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
PDF
Big social data analytics - social network analysis
PDF
Q046049397
PDF
Interactive Recommender Systems
PDF
Data stories
PDF
Sensors, Signals and Sense-making in Human-Energy Relationships
PDF
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
PPTX
Scholarship in the Digital Age
PPTX
How to utilize ‘big data’ on SNS for academic purpose?
PPT
Benchmarking graph databases on the problem of community detection
PDF
Approaches of Data Analysis: Networks generated through Social Media
PDF
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
PDF
User sentiment dynamics in social media: a comparative analysis of X and Threads
PPTX
Cite track presentation
Enrique RCODI presentation symposium 2017
SocialCom09-tutorial.pdf
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
Current trends of opinion mining and sentiment analysis in social networks
Mapping big data science
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Big social data analytics - social network analysis
Q046049397
Interactive Recommender Systems
Data stories
Sensors, Signals and Sense-making in Human-Energy Relationships
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
Scholarship in the Digital Age
How to utilize ‘big data’ on SNS for academic purpose?
Benchmarking graph databases on the problem of community detection
Approaches of Data Analysis: Networks generated through Social Media
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
User sentiment dynamics in social media: a comparative analysis of X and Threads
Cite track presentation

More from Symeon Papadopoulos (20)

PDF
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
PDF
Deepfakes: An Emerging Internet Threat and their Detection
PDF
Knowledge-based Fusion for Image Tampering Localization
PDF
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
PPTX
COVID-19 Infodemic vs Contact Tracing
PDF
Similarity-based retrieval of multimedia content
PPTX
Twitter-based Sensing of City-level Air Quality
PPTX
Aggregating and Analyzing the Context of Social Media Content
PDF
Verifying Multimedia Content on the Internet
PPTX
A Web-based Service for Image Tampering Detection
PPTX
Learning to detect Misleading Content on Twitter
PPTX
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
PPTX
Verifying Multimedia Use at MediaEval 2016
PPTX
Multimedia Privacy
PPTX
Placing Images with Refined Language Models and Similarity Search with PCA-re...
PPTX
In-depth Exploration of Geotagging Performance
PPTX
Perceived versus Actual Predictability of Personal Information in Social Netw...
PPTX
Web and Social Media Image Forensics for News Professionals
PPTX
Predicting News Popularity by Mining Online Discussions
PPTX
Finding Diverse Social Images at MediaEval 2015
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Deepfakes: An Emerging Internet Threat and their Detection
Knowledge-based Fusion for Image Tampering Localization
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
COVID-19 Infodemic vs Contact Tracing
Similarity-based retrieval of multimedia content
Twitter-based Sensing of City-level Air Quality
Aggregating and Analyzing the Context of Social Media Content
Verifying Multimedia Content on the Internet
A Web-based Service for Image Tampering Detection
Learning to detect Misleading Content on Twitter
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Verifying Multimedia Use at MediaEval 2016
Multimedia Privacy
Placing Images with Refined Language Models and Similarity Search with PCA-re...
In-depth Exploration of Geotagging Performance
Perceived versus Actual Predictability of Personal Information in Social Netw...
Web and Social Media Image Forensics for News Professionals
Predicting News Popularity by Mining Online Discussions
Finding Diverse Social Images at MediaEval 2015

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Getting Started with Data Integration: FME Form 101
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
A Presentation on Artificial Intelligence
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine Learning_overview_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Getting Started with Data Integration: FME Form 101
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
A Presentation on Artificial Intelligence

Learning to Classify Users in Online Interaction Networks

  • 1. Learning to Classify Users in Online Interaction Networks Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) ICCSS 2015, June 10, 2015, Helsinki, Finland
  • 2. User Classification #2 Twitter Handle Labels @nytimes usa, press, new york @HuffPostBiz finance @BBCBreaking press, journalist, tv @StKonrath journalist Examples from SNOW 2014 dataset
  • 3. User Classification in (and outside) OSNs #3 OSN online activities log filesAPIs Behaviour Observation Profiling/Classification
  • 4. Network-based User Classification • People with similar interests tend to connect (homophily) • Knowing about one’s connections could reveal information about them • Knowing about the whole network structure could reveal even more… #4
  • 5. Related Work: User Classification Graph-based semi-supervised learning: • Label propagation (Zhu and Ghahramani, 2002) • Local and global consistency (Zhou et al., 2004) • Empirical evaluation of many graph kernels (Fouss et al., 2012) Other approaches to user classification: • Hybrid feature engineering for inferring user behaviors (Pennacchiotti et al., 2011 , Wagner et al., 2013) • Crowdsourcing Twitter list keywords for popular users (Ghosh et al., 2012) • Content-based, graph-regularized NMF for spammer detection (Hu et al., 2013) #5
  • 6. Related Work: Graph Feature Extraction First attempts at using community detection: • EdgeCluster: Edge centric k-means (Tang and Liu, 2009) • MROC: Binary tree community hierarchy (Wang et al., 2013) Low-rank matrix representation methods: • Laplacian Eigenmaps: k eigenvectors of the graph Laplacian (Belkin and Niyogi, 2003 , Tang and Liu, 2011) • Random-Walk Modularity Maximization: Does not suffer from the resolution limit of ModMax (Devooght et al., 2014) • Deepwalk: Deep representation learning (Perozzi et al., 2014) #6
  • 7. Overview of Framework #7 Online social interactions (retweets, mentions, etc.) Social interaction user graph ARCTE Partial/Sparse Annotation Unsupervised graph feature representation Supervised graph feature representation Feature Weighting User Label Learning Classified Users
  • 8. Network Features using ARCTE • Based on user-centric community detection. • We extract for each user, two types of user-centric communities. • Base user-centric community: 𝑐 𝑣 = 𝑁(𝑣) ∪ 𝑣 • Extended user-centric community: Consider a vector 𝑝 𝑣 that contains similarity values among the seed user 𝑣 and all the rest of the users. – By truncating appropriately, we can keep a community of the most similar users to the seed 𝑣. – We keep the fewest possible users such that we still include the seed user’s direct neighbors. • Denote the set of communities detected by 𝐶. We form the feature matrix 𝑋 as follows: 𝑥 𝑣𝑖 = 1, 𝑖𝑓𝑣 ∈ 𝑐𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 , ∀𝑐𝑖 ∈ 𝐶 #8
  • 10. Fast Approximate User-centric PageRank • Given a seed user 𝑣, we calculate the user-centric PageRank vector (i.e. stationary distribution with probability 1 at 𝑣). • Localized, sparse vector; i.e. we neither propagate nor store trivial values. • Instead of approximating the PageRank vector, we approximate cumulative PageRank differences. Better approximation for fewer iterations. • We alternate between two update rules: – Cumulative PR diff: 𝑝(𝑡+1) = 𝑝(𝑡) + 1 − 𝜌 𝑟(𝑡−1) 𝑊𝑢 (instead of PR: 𝑝(𝑡+1) = 𝑝(𝑡) + 𝑟(𝑡) 𝐼 𝑢, (Andersen et al., 2006)) – Residual distribution: 𝑟(𝑡+1) = 𝑟(𝑡) − 𝑟(𝑡) 𝐼 𝑢 + (1 − 𝜌)𝑟(𝑡) 𝑊𝑢 where 𝜌: Restart probability and 𝑊𝑢 the 𝑢-th row of 𝑊 = 𝐷−1 𝐴 and 𝐼 𝑢 the 𝑢-th row of 𝐼 • Finally, we divide each element of 𝑝 by its degree in order to get approximate, user-centric, regularized commute-times. #10
  • 11. Community Weighting • We perform a supervised community weighting step to boost the importance of highly predictive communities. • For each community we calculate a weight: 𝑤 𝑑 = 𝜒2 𝑖 × 𝑖𝑣𝑓(𝑖) • The first factor is based on supervised chi-squared weighting that quantifies the correlation among all feature-label pairs. – PSNR aggregation across labels: 𝜒2 𝑖 = max 𝜒 2 𝑖,𝑙 −min( 𝜒2 𝑖,𝑙 ) 𝑤𝑖𝑡ℎ𝑖𝑛−𝑙𝑎𝑏𝑒𝑙−𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 • The second factor is unsupervised inverse vertex frequency. – Consider idf with vertices as terms and communities as documents. • We multiply each column of 𝑋 with the corresponding weight. #11
  • 12. Evaluation: Dataset Description #12 Datasets Labels Vertices Vertex Type Edges Edge Type SNOW2014 Graph (Papadopoulos et al., 2014) 90 533,874 Twitter Account 949,661 Mentions + Retweets IRMV-PoliticsUK (Greene & Cunningham, 2013) 5 419 Twitter Account 11,349 Mentions + Retweets ASU-YouTube (Mislove et al., 2007) 47 1,134,890 YouTube Channel 2,987,624 Subscriptions ASU-Flickr (Tang and Liu, 2009) 195 80,513 Flickr Account 5,899,882 Contacts Ground truth generation: • SNOW2014 Graph: Twitter list aggregation & post-processing • IRMV-PoliticsUK: Manual annotation • ASU-YouTube: User membership to group • ASU-Flickr: User subscription to interest group
  • 13. Evaluation: SNOW 2014 dataset #13 SNOW2014 Graph (534K, 950K): Twitter mentions + retweets ground truth based on Twitter list processing
  • 14. Evaluation: Insight Politics UK #14 Insight-Multiview-PoliticsUK (419, 11K): mentions + retweets ground truth based on manual annotation
  • 15. Evaluation: ASU-YouTube #15 ASU-YouTube (1.1M, 3M): YouTube subscriptions ground truth based on membership to groups
  • 16. Evaluation: ASU-Flickr #16 ASU-Flickr (80K, 5.9M): Flickr contacts ground truth based on membership to Flickr groups
  • 18. Conclusion • Key ideas: – new user feature representation based on user-centric communities – community weighting based on sparse annotations – consistently good performance both on interaction (mention/retweet) and affiliation (follow/subscribe) graphs • Future Work: – integration of additional signals (content) – investigating feasibility on other classification problems, e.g. spammer detection #18
  • 19. Thank you! • Resources: Slides: http://www.slideshare.net/sympapadopoulos/learning-to-classify- users-in-online-interaction-networks Code: https://github.com/MKLab-ITI/reveal-user-classification https://github.com/MKLab-ITI/reveal-user-annotation • Get in touch: @sympapadopoulos / [email protected] @georgios_rizos / [email protected] #19
  • 20. References (1/3) • Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373-1396. • Tang, L., & Liu, H. (2011). Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3), 447-478. • Devooght, R., Mantrach, A., Kivimäki, I., Bersini, H., Jaimes, A., & Saerens, M. (2014, April). Random walks based modularity: application to semi-supervised learning. In Proceedings of the 23rd international conference on World wide web (pp. 213-224). International World Wide Web Conferences Steering Committee. • Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM. • Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1107-1116). ACM. • Wang, X., Tang, L., Liu, H., & Wang, L. (2013). Learning with multi-resolution overlapping communities. Knowledge and information systems, 36(2), 517-535. #20
  • 21. References (2/3) • Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University. • Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in neural information processing systems, 16(16), 321-328. • Fouss, F., Francoisse, K., Yen, L., Pirotte, A., & Saerens, M. (2012). An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks, 31, 53-72. • Pennacchiotti, M., & Popescu, A. M. (2011, August). Democrats, republicans and starbucks afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 430-438). ACM. • Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., & Gummadi, K. (2012, August). Cognos: crowdsourcing search for topic experts in microblogs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 575-590). ACM. • Hu, X., Tang, J., Zhang, Y., & Liu, H. (2013, August). Social spammer detection in microblogging. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (pp. 2633-2639). AAAI Press. • Wagner, C., Asur, S., & Hailpern, J. (2013, September). Religious politicians and creative photographers: Automatic user categorization in twitter. In Social Computing (SocialCom), 2013 International Conference on (pp. 303-310). IEEE. #21
  • 22. References (3/3) • Andersen, R., Chung, F., & Lang, K. (2006, October). Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on (pp. 475-486). IEEE. • Papadopoulos, S., Corney, D., & Aiello, L. M. (2014). SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Methods in Social Media. In SNOW-DC@ WWW (pp. 1-8). • Greene, D., & Cunningham, P. (2013, May). Producing a unified graph representation from multiple social network views. In Proceedings of the 5th Annual ACM Web Science Conference (pp. 118-121). ACM. • Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B. (2007, October). Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 29-42). ACM. • Tang, L., & Liu, H. (2009, June). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 817-826). ACM. #22
  • 24. Classifying Users using Network Structure • User-centric community detection to the problem of graph-based user classification. We name our approach ARCTE. • Improved approximate, user-centric PageRank calculation for better local graph exploration. • Supervised community weighting step that boosts the importance of highly predictive communities in the feature representation. • Extensive comparative study of numerous state-of- the-art network feature extraction methods on several social interaction datasets. #24

Editor's Notes

  • #3: Topics Political/social attitudes News stories Geographical area User types/roles Useful for news search/discovery Potential privacy issues
  • #4: Different kinds of user classification: topic-oriented (e.g., interest/expertise) role-based/behavioral (e.g., bot/spammer) geographical location Useful for advertising, user recommendation, expert search, etc. For personal accounts, user classification raises privacy concerns Challenges multi-linguality Brevity informal language
  • #19: http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/