SlideShare a Scribd company logo
Ef#icient	
  Execution	
  of	
  
top-­‐k	
  SPARQL	
  queries	
  
Sara	
  Magliacane	
  (VU	
  University	
  Amsterdam)	
  
Alessandro	
  Bozzon	
  (Politecnico	
  di	
  Milano)	
  
Emanuele	
  Della	
  Valle	
  (Politecnico	
  di	
  Milano)	
  
Outline	
  
•  Introduc?on	
  
  •  What	
  are	
  top-­‐k	
  queries?	
  
  •  Why	
  do	
  we	
  need	
  to	
  op?mize	
  them?	
  


•  Our	
  approach:	
  
  •  A	
  rank-­‐aware	
  SPARQL	
  algebra	
  
  •  A	
  rank-­‐aware	
  execu?on	
  model	
  
  •  Three	
  planning	
  strategies	
  

                                                             1	
  
•  Evalua?on	
  
What	
  is	
  a	
  top-­‐k	
  query?	
  
	
  
• A	
  query	
  that	
  returns	
  	
  
              1.  a	
  limited	
  number	
  of	
  results	
  k	
  	
  
              2.  ordered	
  by	
  a	
  scoring	
  func?on	
  that	
  
                  combines	
  several	
  criteria	
  
       	
  

                                                                         2	
  
Rankings,	
  rankings	
  everywhere…	
  




                                           3	
  
Rankings,	
  rankings	
  everywhere…	
  




                                           4	
  
Rankings,	
  rankings	
  everywhere…	
  




                                           5	
  
Why	
  do	
  we	
  need	
  to	
  optimize	
  
 them?	
  
	
  
A	
  very	
  intui?ve	
  and	
  simplified	
  example:	
  

•  Top	
  3	
  largest	
  countries	
  (by	
  both	
  area	
  and	
  
   popula?on)	
  

       	
  

                                                                        6	
  
The	
  standard	
  way:	
  	
  
materialize-­‐then-­‐sort	
  scheme	
  
                         Fetch	
  3	
  best	
  results	
  

              Sort	
  all	
  the	
  242	
  join	
  combina?ons	
  	
        …	
  


          Compute	
  all	
  the	
  242	
  join	
  combina?ons	
  

              242	
                                           242	
  


          Countries	
  by	
                           Countries	
  by	
             …	
  
  …	
        area	
                                    popula?on	
                  …	
  
  …	
                                                                               …	
     7	
  
Can	
  we	
  make	
  it	
  more	
  ef#icient?	
  
 Can	
  we	
  exploit	
  the	
  available	
  sorted	
  access	
  by	
  area	
  and	
  
 by	
  popula?on?	
  

                                      Fetch	
  3	
  best	
  results	
  


           Order	
  incrementally	
  the	
  combina?ons	
  using	
  par0al	
  orders	
  

                          7	
                                             9	
  


                      Countries	
  by	
                         Countries	
  by	
  
                         area	
                                  popula?on	
          …	
  
                                                                                              8	
  
The	
  split-­‐and-­‐interleave	
  scheme	
  
   	
  




•  The	
  intui?on	
  of	
  the	
  previous	
  example	
  can	
  be	
  formalized	
  
   with	
  the	
  split-­‐and-­‐interleave	
  scheme	
  from	
  RDBMS	
  [Li2005,	
  
   Hwang2007,	
  Ilyas2004,	
  Ilyas2008]	
  
   1.  Split	
  the	
  evalua?on	
  of	
  the	
  scoring	
  func?on	
  into	
  single	
  criteria	
  	
  
   2.  Interleave	
  them	
  with	
  other	
  operators	
  
   3.  Use	
  par?al	
  orders	
  to	
  construct	
  incrementally	
  the	
  final	
  order	
  


•  Standard	
  assump?ons:	
  
   •  Monotone	
  scoring	
  func?on	
  
   •  Each	
  criterion	
  is	
  evaluated	
  as	
  a	
  [0,1]	
  number	
  (normaliza?on)	
  

•  Op?mized	
  for	
  the	
  case	
  of	
  fast	
  sorted	
  access	
  for	
  each	
  criterion	
  
                                                                                                            9	
  
No	
  free	
  lunch…	
  
                                                     /01(+!
            !                       Split-­‐and-­‐interleave	
  
                                                     /01(+!
    234!,567!                                        ",)*-).-!!
            !                                                              Orders	
  of	
  
         234!,567!                                                         magnitude	
  
                                                     ",)*-).-!!
                                                                           	
  
  Orders	
  of	
  
  magnitude	
  
  	
  
                     >!                                 *8697.!0:!-7;5.7-!.7;8<,;!+!
                                         !!+=!
                     >!                                  *8697.!0:!-7;5.7-!.7;8<,;!+!
                                         !!!+=!

                                         !

                     ?61.0@767*,!                              /00!68AB!0@7.B7)-!

                     ?61.0@767*,!                              /00!68AB!0@7.B7)-!

Users	
  are	
  interested	
  in	
  1",)*-).-!"#$%&'!(search	
  engines)	
  
                                     <=	
  k	
  <=	
  100	
      C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!
                                                                       C537-!*8697.!0:!)<<!.7;8<,;!E!
234!,567!                                                              C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!
                                             ",)*-).-!"#$%&'!
                                                           "#$%&'(%)*+!C537-!
                                                                       C537-!*8697.!0:!)<<!.7;8<,;!E!
11	
  
Top-­‐k	
  queries	
  in	
  SPARQL	
  1.1	
  
Example	
  query	
  on	
  BSBM	
  [Bizer2009]:	
  
•  The	
  top	
  10	
  offers	
  ordered	
  by	
  the	
  product	
  ra?ngs	
  and	
  offer	
  price:	
  

          SELECT	
  ?product	
  ?offer	
  	
  
          (norm1(?avgRat1)	
  +	
  norm2(?avgRat2)	
  +	
  norm3(?price)	
  
          AS	
  ?score)	
  
          WHERE	
  {	
  	
  
               ?product	
  hasAvgRat1	
  ?avgRat1	
  .	
  
               ?product	
  hasAvgRat2	
  ?avgRat2	
  .	
  
               ?product	
  hasName	
  ?name	
  .	
  
               ?product	
  hasOffers	
  ?offer	
  .	
  
               ?offer	
  hasPrice	
  ?price	
  	
  
	
       }	
  
	
       ORDER	
  BY	
  DESC	
  (?score)	
  	
  
	
       LIMIT	
  10	
                                                                                   12	
  
	
  
                   	
                 t               	
  	
  
Tens	
  of	
  seconds	
  on	
  5M	
  	
  riples	
  (could	
  be	
  improved	
  to	
  milliseconds)	
  
Split-­‐and-­‐interleave	
  in	
  SPARQL?	
  	
  
Related	
  work	
  
	
  
•  A	
  possible	
  solu?on	
  [Straccia2010,	
  Bozzon2011]:	
  
       •  Rewrite	
  SPARQL	
  into	
  SQL	
  	
  
       •  Use	
  exis?ng	
  op?mized	
  RDBMS	
  (e.g.	
  RankSQL	
  [Li2005])	
  

•  Disadvantages:	
  	
  
       •  Works	
  if	
  data	
  are	
  already	
  in	
  a	
  RDBMS	
  

•  What	
  about	
  na?ve	
  SPARQL	
  op?miza?ons?	
  
       •  Federated	
  queries	
  over	
  Linked	
  Data	
  [Wagner2012]:	
  
                                                                                     13	
  
          complementary	
  to	
  our	
  approach	
  
Challenges	
  for	
  native	
  SPARQL	
  
split-­‐and-­‐interleave	
  solutions	
  
	
  
           Query	
            Algebra	
                Algebraic	
  	
                                     Query	
  plan	
  
                                                                                    Planner	
  
                             generator	
                  tree	
                                               	
  
                                                                     Physical	
                    Planning	
  
                              Algebra	
  
                                                                    operators	
                   strategies	
  


       Differences	
  with	
  SQL	
  and	
  RDBMS	
                   Proposed	
  solu0on	
  
       Different	
  algebra	
  	
                                     STEP	
  1:	
  New	
  algebra	
  (algebraic	
  
                                                                     operators	
  and	
  algebraic	
  
                                                                     equivalences)	
  
       Different	
  cost	
  of	
  data	
  access	
  in	
              STEP	
  2:	
  New	
  algorithms	
  for	
  
       na?ve	
  RDF	
  triplestores	
                                physical	
  operators,	
  possibly	
  using	
  
       (sorted	
  access	
  is	
  slow)	
                            less	
  sorted	
  access	
                                14	
  
       Addi?onal	
  op?miza?on	
  dimensions	
   STEP	
  3:	
  New	
  planning	
  strategies	
  
Step	
  1:	
  a	
  rank-­‐aware	
  algebra	
  
•  SPARQL-­‐Rank	
  algebra	
  [Bozzon2011]	
  
  •  Extends	
  the	
  standard	
  SPARQL	
  algebra	
  [Perez2009]	
  	
  
  •  Ranked	
  set	
  of	
  mappings:	
  set	
  of	
  mappings	
  augmented	
  with	
  an	
  
     order	
  rela?on	
  



  	
  

    Extended	
                                      New	
  
   OPERATORS	
                                  EQUIVALENCES	
  
                                                                                                15	
  
The	
  SPARQL-­‐Rank	
  algebraic	
  operators	
  
         ?pr, ?of, ?score                ?pr, ?of, ?score                       ?pr, ?of, ?score

New	
  operator	
  
        SLICE [0,10]                    SLICE [0,10]                            SLICE [0,10]

rank	
   g (?p1)
                                                                                          Sequence
             3                                                                      ?pr = ?pr
                                            g3(?p1)
	
        g (?a1)
             1
                                                                          g3(?p1)               ?pr hasN ?n
                                                                                                 seqScan
                                                                          g1(?a1)
 ?pr hasA1 ?a1 . ?pr hasN ?n .   ?pr hasA1 ?a1 . ?pr hasN ?n .         ?pr hasA1 ?a1 .
 ?pr hasO ?of . ?of hasP1 ?p1    ?pr hasO ?of . ?of hasP1 ?p1    ?pr hasO ?of . ?of hasP1 ?p1
          seqScan                      orderScan_a1                       seqScan
             (a)                             (b)                                    (c)




                                                                                                     16	
  
The	
  Rank	
  Operator	
  




               ?x	
       ?y	
   ?p1	
   ?p2	
                             ?x	
   ?y	
   ?p1	
   Fp1	
  
      µ1	
        1	
       8	
     0.8	
     0.8	
     ρp1	
     µ1	
       1	
         8	
     0.8	
     1.8	
  
      µ2	
        3	
       3	
     0.3	
     0.6	
               µ3	
       3	
         4	
     0.4	
     1.4	
  

      µ3	
        3	
       4	
     0.4	
     0.6	
               µ2	
       3	
         3	
     0.3	
     1.3	
  


                           Ω	
                                                       ρp1(Ω	
  )	
  
The	
  SPARQL-­‐Rank	
  algebraic	
  operators	
  


Redefined	
  	
  
standard	
  	
  
operators	
  
	
  




                                                       18	
  
The	
  Join	
  Operator	
  




            ?x	
   ?y	
   ?p1	
   Fp1	
  
                                                                                                  ?x	
   ?z	
   ?p2	
   Fp2	
  
      µ1	
   1	
   8	
   0.8	
   1.8	
  
                                                                                            µ4	
   1	
   9	
   0.8	
   1.8	
  
      µ3	
     3	
     4	
         0.4	
           1.4	
  
                                                                                            µ5	
   3	
        0	
         0.6	
     1.6	
  
      µ2	
     3	
     3	
         0.3	
           1.3	
  

                       Ωp1	
                                                                                  Ω’p2	
  
                                                     ?x	
         ?y	
   ?z	
         ?p1	
       ?p2	
       Fp1Up2	
  
                               µ1	
  U	
  µ4	
            1	
       8	
       9	
       0.8	
       0.8	
             1.6	
  
                               µ3	
  U	
  µ5	
            3	
       4	
       0	
       0.4	
       0.6	
             1.0	
  
                               µ2	
  U	
  µ5	
            3	
       3	
       0	
       0.3	
       0.6	
             0.9	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  
 Split	
  




                                                 20	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  




•  Allows	
  the	
  splimng	
  of	
  a	
  monolithic	
  scoring	
  func?on	
  into	
  
   several	
  rank	
  operators	
  


   	
  
                                                                                         21	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  


Interleave	
  




                                                   22	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  




•  Allows	
  to	
  order	
  incrementally	
  the	
  results	
  by	
  pushing	
  the	
  
   rank	
  operator	
  inside	
  the	
  query	
  tree.	
  
	
  
                                             	
  
                                               	
  
From	
  algebra	
  to	
  execution	
  




                                                                                                                                  24	
  


Image	
  from:	
  	
  hnp://de-­‐?mekeeper.com/yahoo_site_admin/assets/images/benzinger20gold20gears200291.17120724_std.jpg	
  
Step	
  2:	
  physical	
  operators	
  	
  
(top-­‐k	
  algorithms)	
  
	
  
•  Rank	
  operator	
  	
  
   •  If	
  there	
  is	
  a	
  sorted	
  access	
  index	
  on	
  the	
  ranking	
  criterion	
  we	
  use	
  it	
  
   •  Otherwise:	
  rank	
  aggrega?on	
  algorithms,	
  e.g.	
  [Hwang2007]	
  	
  


•  Join	
  operator	
  
   •  If	
  the	
  right	
  operand	
  does	
  not	
  influence	
  the	
  ranking:	
  streaming	
  
      index	
  join	
  
   •  Otherwise:	
  a	
  rank-­‐join	
  algorithm	
  [see	
  next	
  slides]	
  


•  Other	
  operators	
  are	
  straighsorward:	
  
                                                                                                                        25	
  
   •  E.g.	
  the	
  standard	
  FILTER	
  conserves	
  the	
  ordering	
  of	
  its	
  input	
  
Rank-­‐Join	
  algorithms	
  
•  Different	
  algorithms	
  based	
  on	
  available	
  RankJoin in	
  the	
  inputs:	
  
                                                         access	
  
                                                           (a)
    •  Hash	
  Rank-­‐Join	
                                              RankJoin

         •  e.g.	
  HRJN	
  [Ilyas2004]	
  	
  	
  	
      (a)
                                                                 sortedAccess sortedAccess


    	
                                                                 RankSequence
                                                                 sortedAccess sortedAccess
    	
                                                     (b)
                                                                       RankSequence
    	
                                                     (b)
                                                                 sortedAccess randomAccess

    •  Random	
  Access	
  Rank-­‐Join	
                                RA-RankJoin
                                                                 sortedAccess randomAccess
         •  e.g.	
  RA-­‐HRJN	
  [Ilyas2004]	
  	
  	
  
                                                           (c)           RA-RankJoin
                                                                           RankJoin
                                                                  sortedAccess sortedAccess
                                                                 randomAccess randomAccess
                                                           (c)
                                                           (a)
                                                                  sortedAccess sortedAccess
                                                                   sortedAccess sortedAccess
                                                                 randomAccess randomAccess

    •  RankSequence	
  (e,g,	
  RSEQ)	
                                 RankSequence

         •  Minimum	
  sorted	
  access	
                  (b)
                                                                                               26	
  
         •  Leverages	
  random	
  access	
                       sortedAccess randomAccess

                                                                         RA-RankJoin


                                                           (c)
Rank-­‐Join	
  algorithms	
  
•  Different	
  algorithms	
  based	
  on	
  available	
  RankJoin in	
  the	
  inputs:	
  
                                                         access	
  

    •  Hash	
  Rank-­‐Join	
                               (a)
                                                                          RankJoin             Literature	
  
         •  e.g.	
  HRJN	
  [Ilyas2004]	
  	
  	
  	
      (a)
                                                                 sortedAccess sortedAccess


    	
                                                                 RankSequence
                                                                 sortedAccess sortedAccess
    	
                                                     (b)
                                                                       RankSequence
    	
                                                     (b)
                                                                 sortedAccess randomAccess

    •  Random	
  Access	
  Rank-­‐Join	
                                RA-RankJoin
                                                                 sortedAccess randomAccess
         •  e.g.	
  RA-­‐HRJN	
  [Ilyas2004]	
  	
  	
  
                                                           (c)           RA-RankJoin
                                                                           RankJoin
                                                                  sortedAccess sortedAccess
                                                                 randomAccess randomAccess
                                                           (c)
                                                           (a)
                                                                  sortedAccess sortedAccess
                                                                   sortedAccess sortedAccess
                                                                 randomAccess randomAccess

    •  RankSequence	
  (e,g,	
  RSEQ)	
                                 RankSequence

         •  Minimum	
  sorted	
  access	
                  (b)
                                                                                                                27	
  
         •  Leverages	
  random	
  access	
                       sortedAccess randomAccess

                                                                         RA-RankJoin


                                                           (c)
Rank-­‐Join	
  algorithms	
  
•  Different	
  algorithms	
  based	
  on	
  available	
  RankJoin in	
  the	
  inputs:	
  
                                                         access	
  
                                                           (a)
    •  Hash	
  Rank-­‐Join	
                                              RankJoin

         •  e.g.	
  HRJN	
  [Ilyas2004]	
  	
  	
  	
      (a)
                                                                 sortedAccess sortedAccess


    	
                                                                 RankSequence
                                                                 sortedAccess sortedAccess
    	
                                                     (b)
                                                                       RankSequence
    	
                                                     (b)
                                                                 sortedAccess randomAccess

    •  Random	
  Access	
  Rank-­‐Join	
                                RA-RankJoin
                                                                 sortedAccess randomAccess
         •  e.g.	
  RA-­‐HRJN	
  [Ilyas2004]	
  	
  	
  
                                                           (c)           RA-RankJoin
                                                                           RankJoin
                                                                  sortedAccess sortedAccess
                                                                 randomAccess randomAccess
                                                           (c)
                                                           (a)
                                                                  sortedAccess sortedAccess
                                                                   sortedAccess sortedAccess
                                                                 randomAccess randomAccess

    •  RankSequence	
  (e,g,	
  RSEQ)	
                                 RankSequence           New	
  
         •  Minimum	
  sorted	
  access	
                  (b)
                                                                                                         28	
  
         •  Leverages	
  random	
  access	
                       sortedAccess randomAccess

                                                                         RA-RankJoin


                                                           (c)
Step3:	
  planning	
  strategies	
  
•  Using	
  the	
  algebraic	
  equivalences	
  we	
  can	
  produce	
  several	
  
   equivalent	
  algebraic	
  trees	
  

•  The	
  planner	
  can	
  use	
  them	
  to	
  implement	
  several	
  planning	
  
     strategies	
  
	
  
                                                                                                               ?pr, ?of, ?score                                              ?pr, ?of, ?score
         ?pr, ?of, ?score
               ?pr, ?of, ?score          ?pr, ?of, ?score ?score
                                                ?pr, ?of,                        ?pr, ?of, ?pr, ?of, ?score
                                                                                           ?score
                                                                                                               SLICE [0,10]                                                     SLICE [0,10]
        SLICE [0,10][0,10]
           SLICE                        SLICE [0,10] [0,10]
                                            SLICE                                SLICE [0,10] [0,10]
                                                                                       SLICE                                                                                           Join
                                                                                                                 ORDER
                                                                                           Sequence Sequence      [?score]                                                      ?pr = ?pr
                                                                                                                                                           RankJoin
            g3(?p1)(?p1)
                 g3                                                                ?pr = ?pr?pr = ?pr
                                           g3(?p1) 3(?p1)
                                                 g                                                             EXTEND                                            ?pr = ?pr
                                                                           g3(?p1) g3(?p1) ?pr hasN [?score =g1(?a1)+g2(?a2)+g3(?p1)]
                                                                                                     ?n hasN ?n
                                                                                                     ?pr                                    RankJoin                                                     ?pr hasN ?n .
                                                                                                                                                                                           g2(?a2)
            g1(?a1)(?a1)
                 g1                                                                                                                               ?pr = ?pr
                                                                           g1(?a1) g1(?a1) seqScanseqScan    ?pr hasA1 ?a1.
                                                                                                               ?pr hasA2 ?a2 .             g3(?p1)                g1(?a1)
  ?pr hasA1 ?a1 . ?a1hasN ?n . ?n ?pr hasA1 hasA1 ?a1 . ?pr hasN ?n .
       ?pr hasA1 ?pr . ?pr hasN .       ?pr ?a1 . ?pr hasN ?n .        ?pr hasA1 ?a1 .
                                                                               ?pr hasA1 ?a1 .                 ?pr hasN ?n .
  ?pr hasOhasO ?of hasP1hasP1 ?p1 hasO ?of . ?of hasP1 ?p1
       ?pr ?of .    . ?of ?p1     ?pr ?pr hasO ?of . ?of hasP1 ?p1 hasO ?ofhasO hasP1 ?p1
                                                                   ?pr  ?pr . ?of ?of . ?of hasP1 ?p1          ?pr hasO ?of .           ?pr hasO ?of .
           seqScan
               seqScan                 orderScan_a1
                                             orderScan_a1                 seqScan
                                                                                seqScan                        ?of hasP ?p1.            ?of hasP ?p1 .        ?pr hasA1 ?a1 .          ?pr hasA2 ?a2 .

             (a) (a)                         (b) (b)                               (c)      (c)                     (a)                                                  (b)



1.	
  Rank	
  of	
  BGPs	
                                     2.	
  Interleaved	
                                                                       3.	
  Rank	
  Join	
                                            29	
  
1.	
  Rank	
  of	
  BGPs	
  (ROB)	
  
     •  Split	
  the	
  monolithic	
  scoring	
  func?on	
  into	
  several	
  incremental	
  
        rank	
  operators	
  (rho)	
  


                                                                                   ?product, ?offer, ?score
                     ?product, ?offer, ?score

                      SLICE [0,10]
                                                                                  SLICE [0,10]

                        ORDER                                                         norm3(?price)
                         [?score]

                       EXTEND                                                         norm2(?avgRat2)
[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]
                                                                                      norm1(?avgRat1)
               ?product hasAvgRat1 ?avgRat1.                                ?product hasAvgRat1 ?avgRat1.
               ?product hasAvgRat2 ?avgRat2 .                               ?product hasAvgRat2 ?avgRat2 .
               ?product hasName ?name .                                                                                ?produc
                                                                            ?product hasName ?name .
               ?product hasOffer ?offer .                                                                              ?produc
                                                                            ?product hasOffer ?offer .
               ?offer hasPrice ?price.                                                                                 ?produc
                                                                            ?offer hasPrice ?price.
                                                                                                              30	
     ?offer h

          Materialize-­‐then-­‐sort	
                                           Rank	
  of	
  BGPs	
  
                                                     ?product, ?offer, ?score

                                                       SLICE [0,10]
2.	
  Interleaved	
  (INTER)	
  
       •  Separate	
  the	
  panern	
  in	
  two	
  groups:	
  
              •  Triple	
  panerns	
  that	
  influence	
  the	
  ranking	
  	
  
              •  Triple	
  panerns	
  that	
  don’t	
  influence	
  the	
  ranking	
  


                       ?product, ?offer, ?score                                     ?product, ?offer, ?score

                        SLICE [0,10]                                                 SLICE [0,10]
                          ORDER                                                    ?product = ?product
                           [?score]
                                                                                norm3(?price)          {?product hasName ?name }
                         EXTEND
[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]                        norm1(?avgRat1)

                 ?product hasAvgRat1 ?avgRat1.
                                                                                norm2(?avgRat2)
                 ?product hasAvgRat2 ?avgRat2 .
                 ?product hasName ?name .                           ?product hasAvgRat1 ?avgRat1.
                 ?product hasOffer ?offer .                         ?product hasAvgRat2 ?avgRat2 .
                 ?offer hasPrice ?price.                            ?product hasOffer ?offer .
                                                                    ?offer hasPrice ?price.
                                                                                                                                   31	
  
              Materialize-­‐then-­‐sort	
                                              Interleaved	
  
       	
                                            ?product, ?offer, ?score

                                                       SLICE [0,10]
3.	
  Rank-­‐Join	
  (RJ)	
  
                  •  Split	
  into	
  one	
  triple	
  panern	
  for	
  each	
  ranking	
  criterion	
  
                     Most	
  appropriate	
  join	
  algorithm	
  based	
  on	
  available	
  access	
  
                  • ?product, ?offer, ?score
                     SLICE [0,10]
                       ORDER
                       [?score]

                      EXTEND
[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]
                                    ?product, ?offer, ?score                                                            ?product, ?offer, ?score

                                         SLICE [0,10]
              ?product hasAvgRat1 ?avgRat1.                                                                               SLICE [0,10]
              ?product hasAvgRat2 ?avgRat2 .
              ?product hasName ?name . ORDER                                                                        ?product = ?product
                                           [?score]
              ?product hasOffer ?offer .                                                               RankJoin
              ?offer hasPrice ?price.     EXTEND                                                        ?product = ?product        {?product hasName ?name}
           [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]                    RankJoin
                                                                                        ?product = ?product                  norm2(?avgRat2)
                            ?product hasAvgRat1 ?avgRat1.
                                                                                    norm3(?price)             norm1(?avgRat1)
                            ?product hasAvgRat2 ?avgRat2 .
                            ?product hasName ?name .                  ?product hasOffer ?offer .                                ?product hasAvgRat2 ?avgRat2}
                            ?product hasOffer ?offer .                ?offer hasPrice ?price.       ?product hasAvgRat1 ?avgRat1}
                            ?offer hasPrice ?price.
                                                                                                                                                                32	
  
                       Materialize-­‐then-­‐sort	
                                                           Rank-­‐Join	
  
                                                                ?product, ?offer, ?score

                                                                  SLICE [0,10]
Experimental	
  evaluation	
  




                                 33	
  
Experimental	
  evaluation	
  
•  Prototype	
  implementa?on	
  of	
  our	
  system:	
  
     •  ARQ-­‐Rank	
  (extends	
  Jena	
  ARQ	
  2.8.9)	
  
	
  
•  Extended	
  version	
  of	
  Berlin	
  SPARQL	
  Benchmark	
  
   [Bizer2009]	
  
   •  Added	
  ranking	
  anributes	
  	
  
   •  Added	
  top-­‐k	
  queries	
  


•  Jena	
  TDB	
  0.8.11	
  as	
  storage	
  
                                                                         34	
  
•  Code	
  and	
  experiments:	
  sparqlrank.search-­‐compu?ng.org	
  
Experiment	
  1:	
  compare	
  
planning	
  strategies	
  
•  Example	
  query,	
  5M	
  triples	
  dataset	
  
•  Worst-­‐case	
  scenario:	
  no	
  sorted	
  access	
  indexes	
  (slow	
  sorted	
  
   access)	
  



                                                                            One	
  to	
  two	
  
                                                                            orders	
  of	
  
                                                                            magnitude	
  
                                                                            bener	
  	
  



                                                                                                   35	
  
Experiment	
  1:	
  compare	
  
planning	
  strategies	
  
•  Example	
  query,	
  5M	
  triples	
  dataset	
  
•  Standard	
  scenario:	
  sorted	
  access	
  indexes	
  (fast	
  sorted	
  access)	
  
	
  

                                                                          Two	
  orders	
  of	
  
                                                                          magnitude	
  
                                                                          bener	
  	
  




                                                                                                    36	
  
Experiment	
  2:	
  Small	
  Benchmark	
  	
  
             (8	
  queries)	
  
                                                ($
                                             !"($
                                             !"                                       !"($
                  *+,-.$,/,0+123$14,$5467$




                                                )$
                                             !")$
                                             !"                                       !")$


                                                %$
                                             !"%$
                                             !"                                       !"%$


                                                !$
                                             !"!$
                                             !"                                       !"!$
"#$        !'$   &'$                            !""#$$    %&"#$$    &""#$      !'$    &'$                   37	
  
                                                !""#$$   %&"#$$ &""#$       !'$   &'$   !""#$$ %&"#$$ &""#$        !'$
 :$6;<,$                                                       89:96,:$6;<,$
                                                              89:96,:$6;<,$                          89:96,:$6;<,$
%#$                                                                 !&#  $
                                                                 !"#$                                     !%#$
Conclusions	
  and	
  Future	
  Work	
  
•  A	
  system	
  that	
  speeds	
  up	
  the	
  execu?on	
  of	
  top-­‐k	
  queries	
  in	
  SPARQL	
  
   by	
  orders	
  of	
  magnitude:	
  
    •  STEP	
  1:	
  A	
  rank-­‐aware	
  SPARQL	
  algebra	
  (SPARQL-­‐Rank	
  algebra)	
  
    •  STEP	
  2:	
  A	
  rank-­‐join	
  algorithm	
  (RSEQ)	
  
    •  STEP	
  3:	
  Three	
  planning	
  strategies	
  (ROB,	
  INTER,	
  RJ)	
  
	
  
•  ARQ-­‐Rank,	
  a	
  rank-­‐aware	
  extension	
  of	
  Jena	
  ARQ	
  
•  A	
  small	
  benchmark	
  for	
  top-­‐k	
  queries,	
  based	
  on	
  BSBM	
  [Bizer2009]	
  	
  

•  All	
  available	
  at	
  sparqlrank.search-­‐compu?ng.org	
  
	
  	
  
•  Future	
  work:	
  
    •  More	
  advanced,	
  cost-­‐based,	
  op?miza?on	
  techniques	
  
    •  Extension	
  to	
  federated	
  top-­‐k	
  query	
  processing	
  	
                                 38	
  
    •  Top-­‐k	
  queries	
  under	
  OWL2QL	
  entailment	
  regime	
  
Bibliography	
  
•  [Bozzon2011]	
  A.	
  Bozzon	
  et	
  al.	
  Towards	
  and	
  efficient	
  SPARQL	
  top-­‐k	
  
   query	
  execu?on	
  in	
  virtual	
  RDF	
  stores.	
  In	
  DBRANK	
  workshop	
  at	
  VLDB	
  
   ’11,	
  2011.	
  
•  [Wagner2012]	
  A.	
  Wagner	
  et	
  al.	
  Top-­‐k	
  Linked	
  Data	
  Query	
  Processing.	
  
   In	
  ESWC	
  ’12.	
  Springer,	
  2012.	
  
•  [Bizer2009]	
  C.	
  Bizer	
  and	
  A.	
  Schultz.	
  The	
  Berlin	
  SPARQL	
  Benchmark.	
  
   Int.	
  J.	
  Seman?c	
  Web	
  Inf.	
  Syst.,	
  5(2),	
  2009.	
  
•  [Li2005]	
  C.	
  Li	
  et	
  al.	
  RankSQL:	
  query	
  algebra	
  and	
  op?miza?on	
  for	
  
   rela?onal	
  top-­‐k	
  queries.	
  In	
  SIGMOD	
  ’05.	
  ACM,	
  2005.	
  
•  [DellaValle2012]	
  E.	
  Della	
  Valle	
  et	
  al.	
  Order	
  maners!	
  harnessing	
  a	
  
   world	
  of	
  orderings	
  for	
  reasoning	
  over	
  massive	
  data.	
  Seman?c	
  Web	
  
   Journal,	
  2012.	
  
•  [Hwang2007]	
  S.-­‐w.	
  Hwang	
  and	
  K.	
  Chang.	
  Probe	
  minimiza?on	
  by	
               39	
  
   schedule	
  op?miza?on:	
  Suppor?ng	
  top-­‐k	
  queries	
  with	
  expensive	
  
   predicates.	
  IEEE	
  TKDE,	
  19(5),	
  2007.	
  
Bibliography	
  
•  [Ilyas2004]	
  I.	
  F.	
  Ilyas	
  et	
  al.	
  Rank-­‐aware	
  Query	
  Op?miza?on.	
  In	
  
   SIGMOD	
  ’04.	
  ACM,	
  2004.	
  	
  
•  [Ilyas2008]	
  I.F.Ilyas	
  et	
  al.	
  A	
  survey	
  of	
  top-­‐k	
  query	
  processing	
  
   techniques	
  in	
  rela?onal	
  database	
  systems.	
  ACM	
  Comput.	
  Surv.,	
  40
   (4),	
  2008.	
  	
  
•  [Perez2009]	
  J.	
  Perez	
  et	
  al.	
  Seman?cs	
  and	
  complexity	
  of	
  SPARQL.	
  
   ACM	
  Trans.	
  Database	
  Syst.,	
  34(3),	
  2009.	
  	
  
•  [Schmidt2010]	
  M.	
  Schmidt	
  et	
  al.	
  Founda?ons	
  of	
  SPARQL	
  query	
  
   op?miza?on.	
  In	
  ICDT	
  ’10,	
  ACM,	
  2010.	
  	
  
•  [Straccia2010]	
  U.	
  Straccia.	
  SoxFacts:	
  A	
  top-­‐k	
  retrieval	
  engine	
  for	
  
   ontology	
  mediated	
  access	
  to	
  rela?onal	
  databases.	
  In	
  SMC	
  ’10.	
  IEEE,	
  
   2010.	
  	
  
                                                                                                       40	
  
41	
  
BACK-­‐UP	
  SLIDES	
     42	
  
Why	
  do	
  we	
  need	
  to	
  optimize	
  
 them?	
  
	
  
An	
  addi?onal	
  less	
  intui?ve	
  and	
  less	
  
simplified	
  example:	
  

•  Top	
  2	
  couples	
  of	
  most	
  populated	
  ci?es	
  and	
  
   largest	
  countries	
  
                          Moscow	
   Shanghai	
  
       	
  

                                                                        43	
  
The	
  materialize-­‐then-­‐sort	
  
                         scheme	
  
                                                                                       Moscow	
   Shanghai	
  
                                 Fetch	
  2	
  best	
  results	
  

                         Sort	
  all	
  14K	
  join	
  combina?ons	
  	
                               Shanghai	
   …	
          Va?can	
  



                       Materialize	
  all	
  14K	
  combina?ons	
  

      1	
              249	
                                                       14K*	
                                      Shanghai	
  
0.567	
                                                                                                                        Istanbul	
  
0.563	
                                                                                                                        Karachi	
  
                                                                                                                               Mumbai	
  
0.497	
            Countries	
  by	
                                       Ci?es	
  by	
  
                                                                                                                               Moscow	
  
0.185	
               area	
                                              popula?on	
  
 0.05	
  
                                                                                                                       …	
  
 0.04	
  
                                                                                                                               Va?can	
       44	
  
2e-­‐08	
  



                                         *	
  According	
  to	
  DBPedia,	
  but	
  probably	
  more	
  
Can	
  we	
  make	
  it	
  more	
  ef#icient?	
  
 Can	
  we	
  exploit	
  the	
  sorted	
  access	
  by	
  area	
  and	
  by	
  
 popula?on?	
  	
  
                                                                             Moscow	
   Shanghai	
  
                                         Fetch	
  2	
  best	
  results	
  


              Order	
  incrementally	
  the	
  combina?ons	
  using	
  par0al	
  orders	
  

                             9	
                                              13	
  
                                                                                                       Shanghai	
  
                                                                                                       Istanbul	
  
                                                                                                       Karachi	
  
                         Countries	
  by	
                           Ci?es	
  by	
                     Mumbai	
  
                            area	
                                  popula?on	
                        Moscow	
  
      …	
                                                                                      …	
                    45	
  
SPARQL-­‐Rank	
  algebra	
  De#initions	
  
 Mapping µ … an intermediate SPARQL solution, equivalent to a SQL
 tuple
                                     ?x	
        ?y	
        ?p1	
        ?p2	
  
                            µ1	
         1	
         8	
        0.8	
        0.8	
  
    set of mappings
                            µ2	
         3	
         3	
        0.3	
        0.6	
  




  Maximal possible score
  Given a scoring function F (p1, …, pn) and a set of predicates P = {p1, …,
  pj} the maximal possible score for a mapping µ is defined as:

        FP (p1, …, pn) [µ] = F   (    pi = pi [µ] if pi ∈ P
                                      pi = 1       otherwise
                                                                                       ∀i   )
SPARQL-­‐Rank	
  algebra	
  De#initions	
  

 Ranking principle

 Given two mappings µ1 e µ2 with FP [µ1]> FP [µ2] , if we process µ2 we
 need to process also µ1.




 Ranked set of mappings
 Given a set of predicates P, a ranked set of mappings ΩP is a set of
 mappings Ω augmented with the following properties:
   •  Score: for each mapping µ, the maximal possible score FP [µ]
   •  Order: the order relation <ΩP is defined on ΩP based on the scores
      of the single mappings
The	
  SPARQL-­‐Rank	
  algebraic	
  operators	
  




                                                     48	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  




                                                 49	
  
SPARQL-­‐Rank	
  algebraic	
  equivalences	
  




   Allows to order incrementally the results by pushing the rank operator
   inside the query execution tree.
The	
  RSEQ	
  algorithm	
  




                               51	
  
Evaluation:	
  additional	
  
technical	
  information	
  
•  Experimental	
  semng:	
  
   •    AMD	
  64	
  bit	
  processor	
  2.66	
  GHz	
  
   •    4	
  GB	
  RAM	
  
   •    Debian	
  kernel	
  2.6.26-­‐2	
  
   •    Sun	
  Java	
  1.6.0	
  	
  
          •  Maximum	
  heap	
  size	
  2GB	
  	
  
•  8	
  queries	
  available	
  at	
  sparqlrank.search-­‐compu?ng.org	
  




                                                                             52	
  
More	
  experimental	
  results	
  
the	
  RankJoin	
  operators	
  
•  Example	
  query,	
  5M	
  triples	
  dataset	
  
•  Worst-­‐case	
  scenario:	
  no	
  sorted	
  access	
  indexes	
  (lex)	
  
   •  RSEQ	
  is	
  the	
  best,	
  especially	
  for	
  k	
  <	
  1000	
  
•  Standard	
  scenario:	
  sorted	
  access	
  indexes	
  (right)	
  
    •  All	
  three	
  are	
  comparable,	
  RA-­‐HRJN	
  is	
  best	
  for	
  k	
  >	
  1000	
  




                                                                                                    53	
  
ARQ-­‐Rank	
  architecture	
  




                                 54	
  

More Related Content

PDF
Ancestral Causal Inference - NIPS 2016 poster
PDF
Ancestral Causal Inference - WIML 2016 @ NIPS
PDF
Talk: Joint causal inference on observational and experimental data - NIPS 20...
PDF
ISWC DC poster "Reconstructing Provenance"
PDF
Formal Maps and their Algebra
PDF
Trust Models for RDF Data: Semantics and Complexity - AAAI2015
PDF
NIPS2010: optimization algorithms in machine learning
PPTX
Alexis Ohanian talks about the early days of reddit at Mass Challenge
Ancestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - WIML 2016 @ NIPS
Talk: Joint causal inference on observational and experimental data - NIPS 20...
ISWC DC poster "Reconstructing Provenance"
Formal Maps and their Algebra
Trust Models for RDF Data: Semantics and Complexity - AAAI2015
NIPS2010: optimization algorithms in machine learning
Alexis Ohanian talks about the early days of reddit at Mass Challenge

Similar to ISWC 2012 "Efficient execution of top-k SPARQL queries" (20)

PDF
Dsdt meetup oct24
PDF
DSDT Meetup October 2017
PDF
Dsdt meetup oct24
PDF
Bringing back the excitement to data analysis
PDF
14 spatial analyst
PDF
Don't optimize my queries, organize my data!
PPTX
Phases of distributed query processing
PPT
Tunning overview
PPT
Algebra relacional
PDF
Benchmarking Apache Druid
PDF
Benchmarking Apache Druid
PPTX
Presentation
PDF
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
PPT
261197832 8-performance-tuning-part i
PPTX
Conceptos básicos. Seminario web 1: Introducción a NoSQL
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PDF
Codemotion 2015 Infinispan Tech lab
KEY
Project Progress
PDF
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
PDF
What's New MySQL 8.0?
Dsdt meetup oct24
DSDT Meetup October 2017
Dsdt meetup oct24
Bringing back the excitement to data analysis
14 spatial analyst
Don't optimize my queries, organize my data!
Phases of distributed query processing
Tunning overview
Algebra relacional
Benchmarking Apache Druid
Benchmarking Apache Druid
Presentation
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
261197832 8-performance-tuning-part i
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Codemotion 2015 Infinispan Tech lab
Project Progress
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
What's New MySQL 8.0?
Ad

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Machine Learning_overview_presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
Group 1 Presentation -Planning and Decision Making .pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Machine Learning_overview_presentation.pptx
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Ad

ISWC 2012 "Efficient execution of top-k SPARQL queries"

  • 1. Ef#icient  Execution  of   top-­‐k  SPARQL  queries   Sara  Magliacane  (VU  University  Amsterdam)   Alessandro  Bozzon  (Politecnico  di  Milano)   Emanuele  Della  Valle  (Politecnico  di  Milano)  
  • 2. Outline   •  Introduc?on   •  What  are  top-­‐k  queries?   •  Why  do  we  need  to  op?mize  them?   •  Our  approach:   •  A  rank-­‐aware  SPARQL  algebra   •  A  rank-­‐aware  execu?on  model   •  Three  planning  strategies   1   •  Evalua?on  
  • 3. What  is  a  top-­‐k  query?     • A  query  that  returns     1.  a  limited  number  of  results  k     2.  ordered  by  a  scoring  func?on  that   combines  several  criteria     2  
  • 7. Why  do  we  need  to  optimize   them?     A  very  intui?ve  and  simplified  example:   •  Top  3  largest  countries  (by  both  area  and   popula?on)     6  
  • 8. The  standard  way:     materialize-­‐then-­‐sort  scheme   Fetch  3  best  results   Sort  all  the  242  join  combina?ons     …   Compute  all  the  242  join  combina?ons   242   242   Countries  by   Countries  by   …   …   area   popula?on   …   …   …   7  
  • 9. Can  we  make  it  more  ef#icient?   Can  we  exploit  the  available  sorted  access  by  area  and   by  popula?on?   Fetch  3  best  results   Order  incrementally  the  combina?ons  using  par0al  orders   7   9   Countries  by   Countries  by   area   popula?on   …   8  
  • 10. The  split-­‐and-­‐interleave  scheme     •  The  intui?on  of  the  previous  example  can  be  formalized   with  the  split-­‐and-­‐interleave  scheme  from  RDBMS  [Li2005,   Hwang2007,  Ilyas2004,  Ilyas2008]   1.  Split  the  evalua?on  of  the  scoring  func?on  into  single  criteria     2.  Interleave  them  with  other  operators   3.  Use  par?al  orders  to  construct  incrementally  the  final  order   •  Standard  assump?ons:   •  Monotone  scoring  func?on   •  Each  criterion  is  evaluated  as  a  [0,1]  number  (normaliza?on)   •  Op?mized  for  the  case  of  fast  sorted  access  for  each  criterion   9  
  • 11. No  free  lunch…   /01(+! ! Split-­‐and-­‐interleave   /01(+! 234!,567! ",)*-).-!! ! Orders  of   234!,567! magnitude   ",)*-).-!!   Orders  of   magnitude     >! *8697.!0:!-7;5.7-!.7;8<,;!+! !!+=! >! *8697.!0:!-7;5.7-!.7;8<,;!+! !!!+=! ! ?61.0@767*,! /[email protected])-! ?61.0@767*,! /[email protected])-! Users  are  interested  in  1",)*-).-!"#$%&'!(search  engines)   <=  k  <=  100   C537-!*8697.!0:!-7;5.7-!.7;8<,;!D! C537-!*8697.!0:!)<<!.7;8<,;!E! 234!,567! C537-!*8697.!0:!-7;5.7-!.7;8<,;!D! ",)*-).-!"#$%&'! "#$%&'(%)*+!C537-! C537-!*8697.!0:!)<<!.7;8<,;!E!
  • 12. 11  
  • 13. Top-­‐k  queries  in  SPARQL  1.1   Example  query  on  BSBM  [Bizer2009]:   •  The  top  10  offers  ordered  by  the  product  ra?ngs  and  offer  price:   SELECT  ?product  ?offer     (norm1(?avgRat1)  +  norm2(?avgRat2)  +  norm3(?price)   AS  ?score)   WHERE  {     ?product  hasAvgRat1  ?avgRat1  .   ?product  hasAvgRat2  ?avgRat2  .   ?product  hasName  ?name  .   ?product  hasOffers  ?offer  .   ?offer  hasPrice  ?price       }     ORDER  BY  DESC  (?score)       LIMIT  10   12       t     Tens  of  seconds  on  5M    riples  (could  be  improved  to  milliseconds)  
  • 14. Split-­‐and-­‐interleave  in  SPARQL?     Related  work     •  A  possible  solu?on  [Straccia2010,  Bozzon2011]:   •  Rewrite  SPARQL  into  SQL     •  Use  exis?ng  op?mized  RDBMS  (e.g.  RankSQL  [Li2005])   •  Disadvantages:     •  Works  if  data  are  already  in  a  RDBMS   •  What  about  na?ve  SPARQL  op?miza?ons?   •  Federated  queries  over  Linked  Data  [Wagner2012]:   13   complementary  to  our  approach  
  • 15. Challenges  for  native  SPARQL   split-­‐and-­‐interleave  solutions     Query   Algebra   Algebraic     Query  plan   Planner   generator   tree     Physical   Planning   Algebra   operators   strategies   Differences  with  SQL  and  RDBMS   Proposed  solu0on   Different  algebra     STEP  1:  New  algebra  (algebraic   operators  and  algebraic   equivalences)   Different  cost  of  data  access  in   STEP  2:  New  algorithms  for   na?ve  RDF  triplestores   physical  operators,  possibly  using   (sorted  access  is  slow)   less  sorted  access   14   Addi?onal  op?miza?on  dimensions   STEP  3:  New  planning  strategies  
  • 16. Step  1:  a  rank-­‐aware  algebra   •  SPARQL-­‐Rank  algebra  [Bozzon2011]   •  Extends  the  standard  SPARQL  algebra  [Perez2009]     •  Ranked  set  of  mappings:  set  of  mappings  augmented  with  an   order  rela?on     Extended   New   OPERATORS   EQUIVALENCES   15  
  • 17. The  SPARQL-­‐Rank  algebraic  operators   ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score New  operator   SLICE [0,10] SLICE [0,10] SLICE [0,10] rank   g (?p1) Sequence 3 ?pr = ?pr g3(?p1)   g (?a1) 1 g3(?p1) ?pr hasN ?n seqScan g1(?a1) ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?of hasP1 ?p1 seqScan orderScan_a1 seqScan (a) (b) (c) 16  
  • 18. The  Rank  Operator   ?x   ?y   ?p1   ?p2   ?x   ?y   ?p1   Fp1   µ1   1   8   0.8   0.8   ρp1   µ1   1   8   0.8   1.8   µ2   3   3   0.3   0.6   µ3   3   4   0.4   1.4   µ3   3   4   0.4   0.6   µ2   3   3   0.3   1.3   Ω   ρp1(Ω  )  
  • 19. The  SPARQL-­‐Rank  algebraic  operators   Redefined     standard     operators     18  
  • 20. The  Join  Operator   ?x   ?y   ?p1   Fp1   ?x   ?z   ?p2   Fp2   µ1   1   8   0.8   1.8   µ4   1   9   0.8   1.8   µ3   3   4   0.4   1.4   µ5   3   0   0.6   1.6   µ2   3   3   0.3   1.3   Ωp1   Ω’p2   ?x   ?y   ?z   ?p1   ?p2   Fp1Up2   µ1  U  µ4   1   8   9   0.8   0.8   1.6   µ3  U  µ5   3   4   0   0.4   0.6   1.0   µ2  U  µ5   3   3   0   0.3   0.6   0.9  
  • 22. SPARQL-­‐Rank  algebraic  equivalences   •  Allows  the  splimng  of  a  monolithic  scoring  func?on  into   several  rank  operators     21  
  • 24. SPARQL-­‐Rank  algebraic  equivalences   •  Allows  to  order  incrementally  the  results  by  pushing  the   rank  operator  inside  the  query  tree.        
  • 25. From  algebra  to  execution   24   Image  from:    hnp://de-­‐?mekeeper.com/yahoo_site_admin/assets/images/benzinger20gold20gears200291.17120724_std.jpg  
  • 26. Step  2:  physical  operators     (top-­‐k  algorithms)     •  Rank  operator     •  If  there  is  a  sorted  access  index  on  the  ranking  criterion  we  use  it   •  Otherwise:  rank  aggrega?on  algorithms,  e.g.  [Hwang2007]     •  Join  operator   •  If  the  right  operand  does  not  influence  the  ranking:  streaming   index  join   •  Otherwise:  a  rank-­‐join  algorithm  [see  next  slides]   •  Other  operators  are  straighsorward:   25   •  E.g.  the  standard  FILTER  conserves  the  ordering  of  its  input  
  • 27. Rank-­‐Join  algorithms   •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   (a) •  Hash  Rank-­‐Join   RankJoin •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence •  Minimum  sorted  access   (b) 26   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
  • 28. Rank-­‐Join  algorithms   •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   •  Hash  Rank-­‐Join   (a) RankJoin Literature   •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence •  Minimum  sorted  access   (b) 27   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
  • 29. Rank-­‐Join  algorithms   •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   (a) •  Hash  Rank-­‐Join   RankJoin •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence New   •  Minimum  sorted  access   (b) 28   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
  • 30. Step3:  planning  strategies   •  Using  the  algebraic  equivalences  we  can  produce  several   equivalent  algebraic  trees   •  The  planner  can  use  them  to  implement  several  planning   strategies     ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?score ?pr, ?of, ?pr, ?of, ?pr, ?of, ?score ?score SLICE [0,10] SLICE [0,10] SLICE [0,10][0,10] SLICE SLICE [0,10] [0,10] SLICE SLICE [0,10] [0,10] SLICE Join ORDER Sequence Sequence [?score] ?pr = ?pr RankJoin g3(?p1)(?p1) g3 ?pr = ?pr?pr = ?pr g3(?p1) 3(?p1) g EXTEND ?pr = ?pr g3(?p1) g3(?p1) ?pr hasN [?score =g1(?a1)+g2(?a2)+g3(?p1)] ?n hasN ?n ?pr RankJoin ?pr hasN ?n . g2(?a2) g1(?a1)(?a1) g1 ?pr = ?pr g1(?a1) g1(?a1) seqScanseqScan ?pr hasA1 ?a1. ?pr hasA2 ?a2 . g3(?p1) g1(?a1) ?pr hasA1 ?a1 . ?a1hasN ?n . ?n ?pr hasA1 hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?pr . ?pr hasN . ?pr ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasOhasO ?of hasP1hasP1 ?p1 hasO ?of . ?of hasP1 ?p1 ?pr ?of . . ?of ?p1 ?pr ?pr hasO ?of . ?of hasP1 ?p1 hasO ?ofhasO hasP1 ?p1 ?pr ?pr . ?of ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?pr hasO ?of . seqScan seqScan orderScan_a1 orderScan_a1 seqScan seqScan ?of hasP ?p1. ?of hasP ?p1 . ?pr hasA1 ?a1 . ?pr hasA2 ?a2 . (a) (a) (b) (b) (c) (c) (a) (b) 1.  Rank  of  BGPs   2.  Interleaved   3.  Rank  Join   29  
  • 31. 1.  Rank  of  BGPs  (ROB)   •  Split  the  monolithic  scoring  func?on  into  several  incremental   rank  operators  (rho)   ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] SLICE [0,10] ORDER norm3(?price) [?score] EXTEND norm2(?avgRat2) [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] norm1(?avgRat1) ?product hasAvgRat1 ?avgRat1. ?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?produc ?product hasName ?name . ?product hasOffer ?offer . ?produc ?product hasOffer ?offer . ?offer hasPrice ?price. ?produc ?offer hasPrice ?price. 30   ?offer h Materialize-­‐then-­‐sort   Rank  of  BGPs   ?product, ?offer, ?score SLICE [0,10]
  • 32. 2.  Interleaved  (INTER)   •  Separate  the  panern  in  two  groups:   •  Triple  panerns  that  influence  the  ranking     •  Triple  panerns  that  don’t  influence  the  ranking   ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] SLICE [0,10] ORDER ?product = ?product [?score] norm3(?price) {?product hasName ?name } EXTEND [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] norm1(?avgRat1) ?product hasAvgRat1 ?avgRat1. norm2(?avgRat2) ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasAvgRat1 ?avgRat1. ?product hasOffer ?offer . ?product hasAvgRat2 ?avgRat2 . ?offer hasPrice ?price. ?product hasOffer ?offer . ?offer hasPrice ?price. 31   Materialize-­‐then-­‐sort   Interleaved     ?product, ?offer, ?score SLICE [0,10]
  • 33. 3.  Rank-­‐Join  (RJ)   •  Split  into  one  triple  panern  for  each  ranking  criterion   Most  appropriate  join  algorithm  based  on  available  access   • ?product, ?offer, ?score SLICE [0,10] ORDER [?score] EXTEND [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] ?product hasAvgRat1 ?avgRat1. SLICE [0,10] ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ORDER ?product = ?product [?score] ?product hasOffer ?offer . RankJoin ?offer hasPrice ?price. EXTEND ?product = ?product {?product hasName ?name} [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] RankJoin ?product = ?product norm2(?avgRat2) ?product hasAvgRat1 ?avgRat1. norm3(?price) norm1(?avgRat1) ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer . ?product hasAvgRat2 ?avgRat2} ?product hasOffer ?offer . ?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1} ?offer hasPrice ?price. 32   Materialize-­‐then-­‐sort   Rank-­‐Join   ?product, ?offer, ?score SLICE [0,10]
  • 35. Experimental  evaluation   •  Prototype  implementa?on  of  our  system:   •  ARQ-­‐Rank  (extends  Jena  ARQ  2.8.9)     •  Extended  version  of  Berlin  SPARQL  Benchmark   [Bizer2009]   •  Added  ranking  anributes     •  Added  top-­‐k  queries   •  Jena  TDB  0.8.11  as  storage   34   •  Code  and  experiments:  sparqlrank.search-­‐compu?ng.org  
  • 36. Experiment  1:  compare   planning  strategies   •  Example  query,  5M  triples  dataset   •  Worst-­‐case  scenario:  no  sorted  access  indexes  (slow  sorted   access)   One  to  two   orders  of   magnitude   bener     35  
  • 37. Experiment  1:  compare   planning  strategies   •  Example  query,  5M  triples  dataset   •  Standard  scenario:  sorted  access  indexes  (fast  sorted  access)     Two  orders  of   magnitude   bener     36  
  • 38. Experiment  2:  Small  Benchmark     (8  queries)   ($ !"($ !" !"($ *+,-.$,/,0+123$14,$5467$ )$ !")$ !" !")$ %$ !"%$ !" !"%$ !$ !"!$ !" !"!$ "#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$ 37   !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ :$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$ %#$ !&# $ !"#$ !%#$
  • 39. Conclusions  and  Future  Work   •  A  system  that  speeds  up  the  execu?on  of  top-­‐k  queries  in  SPARQL   by  orders  of  magnitude:   •  STEP  1:  A  rank-­‐aware  SPARQL  algebra  (SPARQL-­‐Rank  algebra)   •  STEP  2:  A  rank-­‐join  algorithm  (RSEQ)   •  STEP  3:  Three  planning  strategies  (ROB,  INTER,  RJ)     •  ARQ-­‐Rank,  a  rank-­‐aware  extension  of  Jena  ARQ   •  A  small  benchmark  for  top-­‐k  queries,  based  on  BSBM  [Bizer2009]     •  All  available  at  sparqlrank.search-­‐compu?ng.org       •  Future  work:   •  More  advanced,  cost-­‐based,  op?miza?on  techniques   •  Extension  to  federated  top-­‐k  query  processing     38   •  Top-­‐k  queries  under  OWL2QL  entailment  regime  
  • 40. Bibliography   •  [Bozzon2011]  A.  Bozzon  et  al.  Towards  and  efficient  SPARQL  top-­‐k   query  execu?on  in  virtual  RDF  stores.  In  DBRANK  workshop  at  VLDB   ’11,  2011.   •  [Wagner2012]  A.  Wagner  et  al.  Top-­‐k  Linked  Data  Query  Processing.   In  ESWC  ’12.  Springer,  2012.   •  [Bizer2009]  C.  Bizer  and  A.  Schultz.  The  Berlin  SPARQL  Benchmark.   Int.  J.  Seman?c  Web  Inf.  Syst.,  5(2),  2009.   •  [Li2005]  C.  Li  et  al.  RankSQL:  query  algebra  and  op?miza?on  for   rela?onal  top-­‐k  queries.  In  SIGMOD  ’05.  ACM,  2005.   •  [DellaValle2012]  E.  Della  Valle  et  al.  Order  maners!  harnessing  a   world  of  orderings  for  reasoning  over  massive  data.  Seman?c  Web   Journal,  2012.   •  [Hwang2007]  S.-­‐w.  Hwang  and  K.  Chang.  Probe  minimiza?on  by   39   schedule  op?miza?on:  Suppor?ng  top-­‐k  queries  with  expensive   predicates.  IEEE  TKDE,  19(5),  2007.  
  • 41. Bibliography   •  [Ilyas2004]  I.  F.  Ilyas  et  al.  Rank-­‐aware  Query  Op?miza?on.  In   SIGMOD  ’04.  ACM,  2004.     •  [Ilyas2008]  I.F.Ilyas  et  al.  A  survey  of  top-­‐k  query  processing   techniques  in  rela?onal  database  systems.  ACM  Comput.  Surv.,  40 (4),  2008.     •  [Perez2009]  J.  Perez  et  al.  Seman?cs  and  complexity  of  SPARQL.   ACM  Trans.  Database  Syst.,  34(3),  2009.     •  [Schmidt2010]  M.  Schmidt  et  al.  Founda?ons  of  SPARQL  query   op?miza?on.  In  ICDT  ’10,  ACM,  2010.     •  [Straccia2010]  U.  Straccia.  SoxFacts:  A  top-­‐k  retrieval  engine  for   ontology  mediated  access  to  rela?onal  databases.  In  SMC  ’10.  IEEE,   2010.     40  
  • 42. 41  
  • 44. Why  do  we  need  to  optimize   them?     An  addi?onal  less  intui?ve  and  less   simplified  example:   •  Top  2  couples  of  most  populated  ci?es  and   largest  countries   Moscow   Shanghai     43  
  • 45. The  materialize-­‐then-­‐sort   scheme   Moscow   Shanghai   Fetch  2  best  results   Sort  all  14K  join  combina?ons     Shanghai   …   Va?can   Materialize  all  14K  combina?ons   1   249   14K*   Shanghai   0.567   Istanbul   0.563   Karachi   Mumbai   0.497   Countries  by   Ci?es  by   Moscow   0.185   area   popula?on   0.05   …   0.04   Va?can   44   2e-­‐08   *  According  to  DBPedia,  but  probably  more  
  • 46. Can  we  make  it  more  ef#icient?   Can  we  exploit  the  sorted  access  by  area  and  by   popula?on?     Moscow   Shanghai   Fetch  2  best  results   Order  incrementally  the  combina?ons  using  par0al  orders   9   13   Shanghai   Istanbul   Karachi   Countries  by   Ci?es  by   Mumbai   area   popula?on   Moscow   …   …   45  
  • 47. SPARQL-­‐Rank  algebra  De#initions   Mapping µ … an intermediate SPARQL solution, equivalent to a SQL tuple ?x   ?y   ?p1   ?p2   µ1   1   8   0.8   0.8   set of mappings µ2   3   3   0.3   0.6   Maximal possible score Given a scoring function F (p1, …, pn) and a set of predicates P = {p1, …, pj} the maximal possible score for a mapping µ is defined as: FP (p1, …, pn) [µ] = F ( pi = pi [µ] if pi ∈ P pi = 1 otherwise ∀i )
  • 48. SPARQL-­‐Rank  algebra  De#initions   Ranking principle Given two mappings µ1 e µ2 with FP [µ1]> FP [µ2] , if we process µ2 we need to process also µ1. Ranked set of mappings Given a set of predicates P, a ranked set of mappings ΩP is a set of mappings Ω augmented with the following properties: •  Score: for each mapping µ, the maximal possible score FP [µ] •  Order: the order relation <ΩP is defined on ΩP based on the scores of the single mappings
  • 49. The  SPARQL-­‐Rank  algebraic  operators   48  
  • 51. SPARQL-­‐Rank  algebraic  equivalences   Allows to order incrementally the results by pushing the rank operator inside the query execution tree.
  • 53. Evaluation:  additional   technical  information   •  Experimental  semng:   •  AMD  64  bit  processor  2.66  GHz   •  4  GB  RAM   •  Debian  kernel  2.6.26-­‐2   •  Sun  Java  1.6.0     •  Maximum  heap  size  2GB     •  8  queries  available  at  sparqlrank.search-­‐compu?ng.org   52  
  • 54. More  experimental  results   the  RankJoin  operators   •  Example  query,  5M  triples  dataset   •  Worst-­‐case  scenario:  no  sorted  access  indexes  (lex)   •  RSEQ  is  the  best,  especially  for  k  <  1000   •  Standard  scenario:  sorted  access  indexes  (right)   •  All  three  are  comparable,  RA-­‐HRJN  is  best  for  k  >  1000   53