This document describes RESIN, a query optimizer that eliminates redundant I/O for big data queries. RESIN introduces two new operators - ResinMap and ResinReduce - and two optimization rules - sub-query fusion and binary-operator elimination. These optimizations were found to benefit 40% of queries in the TPC-DS benchmark, improving performance by an average of 1.4x. The optimizer works by fusing operators applied to the same table, eliminating redundant joins or unions, and combining grouped aggregations. An evaluation on a 10GB TPC-DS dataset found RESIN's optimizations significantly reduced redundant I/O for many real-world analytical queries.
Related topics: