SlideShare a Scribd company logo
Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
Query Example SELECT  *  FROM  Jobs J,  Candidates C WHERE  J.Salary <= 95   AND  J.Zipcode = C.Zipcode   AND  C.WorkExp >= 5; … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
What if the query answer is empty? SELECT  *  FROM  Jobs J,  Candidates C WHERE  J.Salary <= 95   AND  J.Zipcode = C.Zipcode   AND  C.WorkExp >= 5; Adjust the conditions What conditions to adjust? How to adjust them?
Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBM 18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBM  5.75%  In a digital library application [JCM + 00]  10.53%   In a bioinformatics application [RCP + 98] 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006
Observations Different ways to adjust the conditions:  Select vs. Join  How much to adjust each condition? Salary <= 100  vs.  Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5 … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
Contributions Query relaxation   framework for selections and joins Lattice -based approach for query relaxation Efficient relaxation  algorithms
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
Query Relaxation Top-k / Nearest neighbor Weight for each condition Skyline No weights are needed Conditions are not considered equal Return non dominated points
Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
Lattice -based Relaxation Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
Overview Motivation  Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
Relaxing Selection Conditions Algorithm: Compute  Skyline  on Jobs Compute  Skyline  on Candidates Join  the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join   Skyline   … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
Relaxing Selection Conditions Join First  Algorithm: Compute the join (disregarding the selections) Compute  Skyline  on join results Salary <= 95 WorkExp >= 5 Join   Skyline   … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
Relaxing Selection Condition Variations Pruning Join Build the Skyline during the join Pruning Join+ Pruning Join Build the local Skyline before the join Sorted Access Join Fagin’s Top-k: sort the columns on relaxation Compute the join Skyline
Relaxing all conditions Multi-Dim.-Index-based-Relaxation  Algorithm: Traverse the index structure  top-down Form pairs of nodes or records Build the  Skyline Skyline Queue
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
Variations Computing  Top-k  over Skyline Weight to each condition Queries with  multiple joins Conditions on  nonnumeric attributes Dominance checking function
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
Experimental Setting Datasets Real Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) Census-Income – UCI KDD Repository Census (200k) Synthetic Independent, Correlated, and Anticorrelated Implementation GNU C++ Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAM
Different algorithms, different behaviors IMDB Dataset
Different datasets, different behaviors Correlated Dataset Anticorrelated Dataset Independent Dataset
How big is the Skyline?
Relaxing join takes time Self-join on Census Dataset
Top-k over Skyline IMDB Dataset
Related Work Muslea et al. Alternate forms of conjunctive expressions Efficient Skyline algorithms Selection queries Efficient Top-k algorithms Require weights for conditions
Conclusions Query relaxation   framework for selections and joins Lattice -based approach for query relaxation Efficient relaxation  algorithms
Future Work Optimum  use of the lattice structure Relax conditions on  string attributes Algorithms applicable  outside the databases
Questions ?
 
Skyline vs. Top-k
Skyline vs. Top-k over Skyline
Ad

Recommended

An approach to model reduction of logistic networks based on ranking
An approach to model reduction of logistic networks based on ranking
MKosmykov
 
Six Sigma Mechanical Tolerance Analysis 1
Six Sigma Mechanical Tolerance Analysis 1
David Panek
 
Integer programming
Integer programming
Hakeem-Ur- Rehman
 
Chapter 5.3
Chapter 5.3
sotlsoc
 
Nested loop
Nested loop
Lal Bdr. Saud
 
Mysql query optimization
Mysql query optimization
Baohua Cai
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
GraySystemsLab
 
The PostgreSQL Query Planner
The PostgreSQL Query Planner
Command Prompt., Inc
 
8 query processing and optimization
8 query processing and optimization
Kumar
 
SQL Joins and Query Optimization
SQL Joins and Query Optimization
Brian Gallagher
 
DOAG: Visual SQL Tuning
DOAG: Visual SQL Tuning
Kyle Hailey
 
Optimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
The life of a query (oracle edition)
The life of a query (oracle edition)
maclean liu
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
Mentor Your Indexes
Mentor Your Indexes
Karwin Software Solutions LLC
 
unit 3 DBMS.docx.pdf geometric transformer in query processing
unit 3 DBMS.docx.pdf geometric transformer in query processing
FallenAngel35
 
unit 3 DBMS.docx.pdf geometry in query p
unit 3 DBMS.docx.pdf geometry in query p
FallenAngel35
 
PPT -The MySQL Query optimizer trace .pdf
PPT -The MySQL Query optimizer trace .pdf
ssuserf469dc1
 
Chapter2.ppt Algorithms Query processing and Optimization
Chapter2.ppt Algorithms Query processing and Optimization
KeenboonAsaffaa
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
San diegophp
San diegophp
Dave Stokes
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
Márton Kodok
 
Advanced Database Management System in Mtech
Advanced Database Management System in Mtech
okmanjunatha23cse
 
Explain that explain
Explain that explain
Fabrizio Parrella
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
Sveta Smirnova
 
Introduction to execution plan analysis
Introduction to execution plan analysis
John Sterrett
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 

More Related Content

Similar to Relaxing Join and Selection Queries - VLDB 2006 Slides (20)

8 query processing and optimization
8 query processing and optimization
Kumar
 
SQL Joins and Query Optimization
SQL Joins and Query Optimization
Brian Gallagher
 
DOAG: Visual SQL Tuning
DOAG: Visual SQL Tuning
Kyle Hailey
 
Optimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
The life of a query (oracle edition)
The life of a query (oracle edition)
maclean liu
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
Mentor Your Indexes
Mentor Your Indexes
Karwin Software Solutions LLC
 
unit 3 DBMS.docx.pdf geometric transformer in query processing
unit 3 DBMS.docx.pdf geometric transformer in query processing
FallenAngel35
 
unit 3 DBMS.docx.pdf geometry in query p
unit 3 DBMS.docx.pdf geometry in query p
FallenAngel35
 
PPT -The MySQL Query optimizer trace .pdf
PPT -The MySQL Query optimizer trace .pdf
ssuserf469dc1
 
Chapter2.ppt Algorithms Query processing and Optimization
Chapter2.ppt Algorithms Query processing and Optimization
KeenboonAsaffaa
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
San diegophp
San diegophp
Dave Stokes
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
Márton Kodok
 
Advanced Database Management System in Mtech
Advanced Database Management System in Mtech
okmanjunatha23cse
 
Explain that explain
Explain that explain
Fabrizio Parrella
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
Sveta Smirnova
 
Introduction to execution plan analysis
Introduction to execution plan analysis
John Sterrett
 
8 query processing and optimization
8 query processing and optimization
Kumar
 
SQL Joins and Query Optimization
SQL Joins and Query Optimization
Brian Gallagher
 
DOAG: Visual SQL Tuning
DOAG: Visual SQL Tuning
Kyle Hailey
 
Optimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
The life of a query (oracle edition)
The life of a query (oracle edition)
maclean liu
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
unit 3 DBMS.docx.pdf geometric transformer in query processing
unit 3 DBMS.docx.pdf geometric transformer in query processing
FallenAngel35
 
unit 3 DBMS.docx.pdf geometry in query p
unit 3 DBMS.docx.pdf geometry in query p
FallenAngel35
 
PPT -The MySQL Query optimizer trace .pdf
PPT -The MySQL Query optimizer trace .pdf
ssuserf469dc1
 
Chapter2.ppt Algorithms Query processing and Optimization
Chapter2.ppt Algorithms Query processing and Optimization
KeenboonAsaffaa
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
Márton Kodok
 
Advanced Database Management System in Mtech
Advanced Database Management System in Mtech
okmanjunatha23cse
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
Sveta Smirnova
 
Introduction to execution plan analysis
Introduction to execution plan analysis
John Sterrett
 

Recently uploaded (20)

Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Ad

Relaxing Join and Selection Queries - VLDB 2006 Slides

  • 1. Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
  • 2. Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  • 3. What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; Adjust the conditions What conditions to adjust? How to adjust them?
  • 4. Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBM 18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBM 5.75% In a digital library application [JCM + 00] 10.53% In a bioinformatics application [RCP + 98] 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006
  • 5. Observations Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5 … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  • 6. Contributions Query relaxation framework for selections and joins Lattice -based approach for query relaxation Efficient relaxation algorithms
  • 7. Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
  • 8. Query Relaxation Top-k / Nearest neighbor Weight for each condition Skyline No weights are needed Conditions are not considered equal Return non dominated points
  • 9. Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001
  • 10. Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
  • 11. Lattice -based Relaxation Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  • 12. Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
  • 13. Relaxing Selection Conditions Algorithm: Compute Skyline on Jobs Compute Skyline on Candidates Join the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  • 14. Relaxing Selection Conditions Join First Algorithm: Compute the join (disregarding the selections) Compute Skyline on join results Salary <= 95 WorkExp >= 5 Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  • 15. Relaxing Selection Condition Variations Pruning Join Build the Skyline during the join Pruning Join+ Pruning Join Build the local Skyline before the join Sorted Access Join Fagin’s Top-k: sort the columns on relaxation Compute the join Skyline
  • 16. Relaxing all conditions Multi-Dim.-Index-based-Relaxation Algorithm: Traverse the index structure top-down Form pairs of nodes or records Build the Skyline Skyline Queue
  • 17. Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
  • 18. Variations Computing Top-k over Skyline Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes Dominance checking function
  • 19. Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments
  • 20. Experimental Setting Datasets Real Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) Census-Income – UCI KDD Repository Census (200k) Synthetic Independent, Correlated, and Anticorrelated Implementation GNU C++ Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAM
  • 21. Different algorithms, different behaviors IMDB Dataset
  • 22. Different datasets, different behaviors Correlated Dataset Anticorrelated Dataset Independent Dataset
  • 23. How big is the Skyline?
  • 24. Relaxing join takes time Self-join on Census Dataset
  • 25. Top-k over Skyline IMDB Dataset
  • 26. Related Work Muslea et al. Alternate forms of conjunctive expressions Efficient Skyline algorithms Selection queries Efficient Top-k algorithms Require weights for conditions
  • 27. Conclusions Query relaxation framework for selections and joins Lattice -based approach for query relaxation Efficient relaxation algorithms
  • 28. Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases
  • 30.  
  • 32. Skyline vs. Top-k over Skyline

Editor's Notes

  • #3: Make a “real” story; companies are very eager to find the people they want. We assume that closer zip codes mean closer areas.
  • #4: Queries can return nothing It is important to have results Automatically do relaxation!
  • #7: Efficiency is a big issue
  • #9: Skyline is not the only way. Skyline does not care which one is more important. We are not comparing apples with oranges.
  • #10: In this diagram we are not relaxing join conditions. Each point is a join pair.
  • #12: Sometimes we might not want to relax the join (e.g.: attribute is an ID) Relaxation is done automatically by the system
  • #14: Skyline as a relational algebra operator with various properties
  • #16: Main idea of the algorithms. For more details see the paper.
  • #17: Index exists, e.g., R-tree; works with other types of multi-dimensional indices Children Queue: Enqueue, Dequeue
  • #19: Explain the Top-k over Skyline
  • #22: We present just a few of our results, for more details see the paper.
  • #24: Skyline size depends on cardinality, number of selections, and data size.
  • #27: Muslea deals primarily with expressibility issues without paying attention to the data management issues involved. We relax queries with selection and join conditions. Other studies assume that the attributes and ordering of the values are already pre-determined in a single table; our work require us to compute skyline dynamically for a set of tables which are to be join and whose attribute values must be determined on the fly. Our work considers both the selection and join conditions for relaxation.
  • #28: Efficiency is a big issue