SlideShare a Scribd company logo
Inside Hive (for beginners)1Takeshi NAKANO / Recruit Co. Ltd.
Why?Hive is good tool for non-specialist!The number of M/R controls the Hive processing time.↓How can we reduce the number?What can we do for this on writing HiveQL?↓How does Hive convert HiveQLto M/R jobs?On this, what optimizing processes are adopted?7/6/2011HIVE - A warehouse solution over Map Reduce Framework2
Don’t you have..This fb’s paper has a lot of information!But this is a little old..7/6/2011HIVE - A warehouse solution over Map Reduce Framework3
Component Level Analysis7/6/2011HIVE - A warehouse solution over Map Reduce Framework4
Hive Architecture / Exec Flow7/6/2011HIVE - A warehouse solution over Map Reduce Framework5ClientHadoopMetastoreDriverCompiler
ClientHadoopDriverCompilerHive WorkflowHive has the operators which are minimum processing units.The process of each operator is done with HDFS operation or M/R jobs.The compiler converts HiveQL to the sets of operators.7/6/2011HIVE - A warehouse solution over Map Reduce Framework6Metastore
Hive WorkflowOperators7/6/2011HIVE - A warehouse solution over Map Reduce Framework7
ClientHadoopMetastoreDriverCompilerHive WorkflowFor M/R processing, Hiveuses ExecMaper and ExecReducer.On processing, we have 2 modes.Local processing modeDistributed processing mode7/6/2011HIVE - A warehouse solution over Map Reduce Framework8
ClientHadoopMetastoreDriverCompilerHive WorkflowOn 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.7/6/2011HIVE - A warehouse solution over Map Reduce Framework9
Compiler : How to Process HiveQL7/6/2011HIVE - A warehouse solution over Map Reduce Framework10ClientHadoopMetastoreDriverCompiler
“Plumbing” of HIVE compiler7/6/201111HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler7/6/201112HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview13ParserSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizer
Compiler Overview14HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
ParserHiveQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);HiveQLTOK_QUERY  + TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“AST  + TOK_INSERT      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2"      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
ParserSQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);SQLTOK_QUERY  + TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“  + TOK_INSERT      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2"      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"AST123SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
17Semantic Analyzer (1/2)ASTQB+ TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“AST1QBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)ParseInfoJoin Node+ TOK_JOIN    + TOK_TABREF        …    + TOK_TABREF        …    + “=”        …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser17
18Semantic Analyzer (2/2)ASTQB      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2”AST2QBParseInfoNameTo Destination Node+ TOK_TAB    + TOK_TABNAME        +"access_log_temp2”SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1818
19Semantic Analyzer (2/2)ASTQB      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"ASTQBParseInfo3Name To Select Node+ TOK_SELECT    + TOK_SELEXPR        …     + TOK_SELEXPR        …    + TOK_SELEXPR        …    + TOK_SELEXPR        …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1919
20Logical Plan Generator (1/4)QBOPTreeQBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)OPTreeTableScanOperator(“access_log_hbase”)TableScanOperator(“product_hbase”)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2020
21Logical Plan Generator (2/4)QBOPTreeQBParseInfo + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“ReduceSinkOperator(“access_log_hbase”)ReduceSinkOperator(“product_hbase”)OPTreeJoinOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
22Logical Plan Generator (3/4)QBOPTreeQBParseInfoName To Select Node+ TOK_SELECT    + TOK_SELEXPR        + "."             + TOK_TABLE_OR_COL                 + "a"             + "user"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "a"             + "prono"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "p"             + "maker"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "p"             + "price"OPTreeSelectOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
23Logical Plan Generator (4/4)QBOPTreeQBMetaDataName To Destination Table Info“insclause-0”=    Table Info(“access_log_temp2”)OPTreeFileSinkOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
Logical Plan Generator (result)24LCF OPTreeTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
Logical OptimizerSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser252525
Logical Optimizer (Predicate Push Down)INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2626
Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2727
INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2828
Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);FilterOperatorFIL_8(maker = 'honda')ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2929
30Physical Plan GeneratorOPTreeTaskTreeMoveTask(Stage-0)OpeTreeLoadTableDescTableScanOperator(TS_0)TableScanOperator(TS_1)ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)JoinOperator(JOIN_4)SelectOperator(SEL_5)FileSinkOperator(FS_6) StatsTask(Stage-2)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser3030
OPTreeTaskTreeMapRedTask (Stage-1/root)TableScanOperator(TS_0)Physical Plan Generator (result)31LCF MapperTableScanOperatorTS_1TableScanOperatorTS_0TableScanOperator(TS_1)ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)ReducerJoinOperatorJOIN_4JoinOperator(JOIN_4)SelectOperatorSEL_5SelectOperator(SEL_5)FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser313131
32Physical OptimizerTaskTreeTaskTreejava/org/apache/hadoop/hive/ql/optimizer/physical/以下SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
33Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapRedTask (Stage-1)MapperTableScanOperatorTS_1TableScanOperatorTS_0MapJoinOperatorMAPJOIN_7SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser33
34Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapredLocalTask(Stage-7)MapRedTask (Stage-1)TableScanOperatorTS_0MapperTableScanOperatorTS_1TableScanOperatorTS_0HashTableSinkOperatorHASHTABLESINK_11MapJoinOperatorMAPJOIN_7MapRedTask (Stage-1)SelectOperatorSEL_8MapperTableScanOperatorTS_1SelectOperatorSEL_5MapJoinOperatorMAPJOIN_7FileSinkOperatorFS_6SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser34
In the end7/6/2011HIVE - A warehouse solution over Map Reduce Framework35ClientHadoopMetastoreDriverCompiler
In the end36HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
End7/6/201137
Appendix: What does Explain show?7/6/2011HIVE - A warehouse solution over Map Reduce Framework38
Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2    >  SELECT a.user, a.prono, p.maker, p.price    >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE:  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Alias -> Map Operator Tree:        aTableScan            alias: a            Reduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 0              value expressions:expr: user                    type: stringexpr: prono                    type: int        pTableScan            alias: p            Reduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 1              value expressions:expr: maker                    type: stringexpr: price                    type: intReduce Operator Tree:        Join Operator          condition map:               Inner Join 0 to 1          condition expressions:            0 {VALUE._col0} {VALUE._col2}            1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7          Select Operator            expressions:expr: _col0                  type: stringexpr: _col2                  type: intexpr: _col6                  type: stringexpr: _col7                  type: intoutputColumnNames: _col0, _col1, _col2, _col3            File Output Operator              compressed: falseGlobalTableId: 1              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: default.access_log_temp2  Stage: Stage-0    Move Operator      tables:          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: default.access_log_temp2  Stage: Stage-2    Stats-Aggr OperatorTime taken: 0.1 secondshive>
Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2    >  SELECT a.user, a.prono, p.maker, p.price    >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE:  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Alias -> Map Operator Tree:        aTableScan            alias: aReduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 0              value expressions:expr: user                    type: stringexpr: prono                    type: int        pTableScan            alias: pReduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 1              value expressions:expr: maker                    type: stringexpr: price                    type: intABSTRACT SYNTAX TREE:STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Map Operator Tree:TableScan            Reduce Output OperatorTableScan            Reduce Output Operator      Reduce Operator Tree:        Join Operator          Select Operator            File Output Operator  Stage: Stage-0    Move Operator  Stage: Stage-2    Stats-Aggr OperatorReduce Operator Tree:        Join Operator          condition map:               Inner Join 0 to 1          condition expressions:            0 {VALUE._col0} {VALUE._col2}            1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7          Select Operator            expressions:expr: _col0                  type: stringexpr: _col2                  type: intexpr: _col6                  type: stringexpr: _col7                  type: intoutputColumnNames: _col0, _col1, _col2, _col3File Output Operator              compressed: falseGlobalTableId: 1              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: default.access_log_temp2  Stage: Stage-0    Move Operator      tables:          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: default.access_log_temp2  Stage: Stage-2    Stats-Aggr OperatorTime taken: 0.1 secondshive>
Appendix: What does Explain show?ABSTRACT SYNTAX TREE:STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Map Operator Tree:TableScan            Reduce Output OperatorTableScan            Reduce Output Operator      Reduce Operator Tree:        Join Operator          Select Operator            File Output Operator  Stage: Stage-0    Move Operator  Stage: Stage-2    Stats-Aggr OperatorMapRedTask (Stage-1/root)MapperTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReducerJoinOperatorJOIN_4≒SelectOperatorSEL_5FileSinkOperatorFS_6MoveTask (Stage-0)Stats Task (Stage-2)

More Related Content

PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
What is new in Apache Hive 3.0?
PDF
Hive tuning
PPTX
Using Apache Hive with High Performance
PPTX
Introduction to HiveQL
PPT
Zookeeper Introduce
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PPTX
Apache hive introduction
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
What is new in Apache Hive 3.0?
Hive tuning
Using Apache Hive with High Performance
Introduction to HiveQL
Zookeeper Introduce
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Apache hive introduction

What's hot (20)

PPTX
Hive + Tez: A Performance Deep Dive
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
PDF
Understanding Query Plans and Spark UIs
PPTX
Apache Hive Tutorial
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
PPTX
Session 14 - Hive
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
PPTX
Kudu Deep-Dive
PDF
HBase Advanced - Lars George
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Deep Dive: Memory Management in Apache Spark
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PDF
Redshift VS BigQuery
PDF
CDC Stream Processing with Apache Flink
PPTX
HBase Low Latency
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
PPTX
MySQL_MariaDB-성능개선-202201.pptx
PDF
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
PDF
AWS EMR Cost optimization
Hive + Tez: A Performance Deep Dive
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Understanding Query Plans and Spark UIs
Apache Hive Tutorial
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Session 14 - Hive
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Kudu Deep-Dive
HBase Advanced - Lars George
The Parquet Format and Performance Optimization Opportunities
Deep Dive: Memory Management in Apache Spark
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Redshift VS BigQuery
CDC Stream Processing with Apache Flink
HBase Low Latency
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
MySQL_MariaDB-성능개선-202201.pptx
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
AWS EMR Cost optimization
Ad

Similar to Internal Hive (20)

PDF
Pdxpugday2010 pg90
PDF
Hive_p
PPT
Python And GIS - Beyond Modelbuilder And Pythonwin
PPTX
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
ODP
Python 3000
PPT
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
DOCX
Computer science project work
PPT
Migration testing framework
PDF
Code Management
PDF
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
PPTX
Jquery mobile
PPTX
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
PPT
Introduction to Assembly Language
KEY
CloudKit
PPT
JDBC Java Database Connectivity
PDF
VoCamp Seoul2009 Sparql
PPT
What's new in Rails 2?
PPT
PDF
TYPO3 Extension development using new Extbase framework
Pdxpugday2010 pg90
Hive_p
Python And GIS - Beyond Modelbuilder And Pythonwin
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Python 3000
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Computer science project work
Migration testing framework
Code Management
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Jquery mobile
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Introduction to Assembly Language
CloudKit
JDBC Java Database Connectivity
VoCamp Seoul2009 Sparql
What's new in Rails 2?
TYPO3 Extension development using new Extbase framework
Ad

More from Recruit Technologies (20)

PDF
新卒2年目が鍛えられたコードレビュー道場
PDF
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
PDF
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
PDF
Tableau活用4年の軌跡
PDF
HadoopをBQにマイグレしようとしてる話
PDF
LT(自由)
PDF
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
PDF
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
PDF
リクルート式AIの活用法
PDF
銀行ロビーアシスタント
PDF
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
PDF
ユーザー企業内製CSIRTにおける対応のポイント
PDF
ユーザーからみたre:Inventのこれまでと今後
PDF
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
PDF
EMRでスポットインスタンスの自動入札ツールを作成する
PDF
RANCHERを使ったDev(Ops)
PDF
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
PDF
ユーザー企業内製CSIRTにおける対応のポイント
PDF
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
PDF
「リクルートデータセット」 ~公開までの道のりとこれから~
新卒2年目が鍛えられたコードレビュー道場
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Tableau活用4年の軌跡
HadoopをBQにマイグレしようとしてる話
LT(自由)
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
リクルート式AIの活用法
銀行ロビーアシスタント
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
ユーザー企業内製CSIRTにおける対応のポイント
ユーザーからみたre:Inventのこれまでと今後
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
EMRでスポットインスタンスの自動入札ツールを作成する
RANCHERを使ったDev(Ops)
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
ユーザー企業内製CSIRTにおける対応のポイント
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
「リクルートデータセット」 ~公開までの道のりとこれから~

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Getting Started with Data Integration: FME Form 101
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Getting Started with Data Integration: FME Form 101
Per capita expenditure prediction using model stacking based on satellite ima...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine Learning_overview_presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Group 1 Presentation -Planning and Decision Making .pptx
OMC Textile Division Presentation 2021.pptx
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Tartificialntelligence_presentation.pptx
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars

Internal Hive

  • 1. Inside Hive (for beginners)1Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why?Hive is good tool for non-specialist!The number of M/R controls the Hive processing time.↓How can we reduce the number?What can we do for this on writing HiveQL?↓How does Hive convert HiveQLto M/R jobs?On this, what optimizing processes are adopted?7/6/2011HIVE - A warehouse solution over Map Reduce Framework2
  • 3. Don’t you have..This fb’s paper has a lot of information!But this is a little old..7/6/2011HIVE - A warehouse solution over Map Reduce Framework3
  • 4. Component Level Analysis7/6/2011HIVE - A warehouse solution over Map Reduce Framework4
  • 5. Hive Architecture / Exec Flow7/6/2011HIVE - A warehouse solution over Map Reduce Framework5ClientHadoopMetastoreDriverCompiler
  • 6. ClientHadoopDriverCompilerHive WorkflowHive has the operators which are minimum processing units.The process of each operator is done with HDFS operation or M/R jobs.The compiler converts HiveQL to the sets of operators.7/6/2011HIVE - A warehouse solution over Map Reduce Framework6Metastore
  • 7. Hive WorkflowOperators7/6/2011HIVE - A warehouse solution over Map Reduce Framework7
  • 8. ClientHadoopMetastoreDriverCompilerHive WorkflowFor M/R processing, Hiveuses ExecMaper and ExecReducer.On processing, we have 2 modes.Local processing modeDistributed processing mode7/6/2011HIVE - A warehouse solution over Map Reduce Framework8
  • 9. ClientHadoopMetastoreDriverCompilerHive WorkflowOn 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.7/6/2011HIVE - A warehouse solution over Map Reduce Framework9
  • 10. Compiler : How to Process HiveQL7/6/2011HIVE - A warehouse solution over Map Reduce Framework10ClientHadoopMetastoreDriverCompiler
  • 11. “Plumbing” of HIVE compiler7/6/201111HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler7/6/201112HIVE - A warehouse solution over Map Reduce Framework
  • 14. Compiler Overview14HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
  • 15. ParserHiveQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);HiveQLTOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 16. ParserSQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);SQLTOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"AST123SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 17. 17Semantic Analyzer (1/2)ASTQB+ TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“AST1QBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)ParseInfoJoin Node+ TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser17
  • 18. 18Semantic Analyzer (2/2)ASTQB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2”AST2QBParseInfoNameTo Destination Node+ TOK_TAB + TOK_TABNAME +"access_log_temp2”SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1818
  • 19. 19Semantic Analyzer (2/2)ASTQB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"ASTQBParseInfo3Name To Select Node+ TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1919
  • 20. 20Logical Plan Generator (1/4)QBOPTreeQBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)OPTreeTableScanOperator(“access_log_hbase”)TableScanOperator(“product_hbase”)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2020
  • 21. 21Logical Plan Generator (2/4)QBOPTreeQBParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ReduceSinkOperator(“access_log_hbase”)ReduceSinkOperator(“product_hbase”)OPTreeJoinOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 22. 22Logical Plan Generator (3/4)QBOPTreeQBParseInfoName To Select Node+ TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"OPTreeSelectOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 23. 23Logical Plan Generator (4/4)QBOPTreeQBMetaDataName To Destination Table Info“insclause-0”= Table Info(“access_log_temp2”)OPTreeFileSinkOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 24. Logical Plan Generator (result)24LCF OPTreeTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 26. Logical Optimizer (Predicate Push Down)INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2626
  • 27. Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2727
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2828
  • 29. Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);FilterOperatorFIL_8(maker = 'honda')ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2929
  • 31. OPTreeTaskTreeMapRedTask (Stage-1/root)TableScanOperator(TS_0)Physical Plan Generator (result)31LCF MapperTableScanOperatorTS_1TableScanOperatorTS_0TableScanOperator(TS_1)ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)ReducerJoinOperatorJOIN_4JoinOperator(JOIN_4)SelectOperatorSEL_5SelectOperator(SEL_5)FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser313131
  • 33. 33Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapRedTask (Stage-1)MapperTableScanOperatorTS_1TableScanOperatorTS_0MapJoinOperatorMAPJOIN_7SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser33
  • 34. 34Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapredLocalTask(Stage-7)MapRedTask (Stage-1)TableScanOperatorTS_0MapperTableScanOperatorTS_1TableScanOperatorTS_0HashTableSinkOperatorHASHTABLESINK_11MapJoinOperatorMAPJOIN_7MapRedTask (Stage-1)SelectOperatorSEL_8MapperTableScanOperatorTS_1SelectOperatorSEL_5MapJoinOperatorMAPJOIN_7FileSinkOperatorFS_6SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser34
  • 35. In the end7/6/2011HIVE - A warehouse solution over Map Reduce Framework35ClientHadoopMetastoreDriverCompiler
  • 36. In the end36HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
  • 38. Appendix: What does Explain show?7/6/2011HIVE - A warehouse solution over Map Reduce Framework38
  • 39. Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: aTableScan alias: a Reduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 0 value expressions:expr: user type: stringexpr: prono type: int pTableScan alias: p Reduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 1 value expressions:expr: maker type: stringexpr: price type: intReduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions:expr: _col0 type: stringexpr: _col2 type: intexpr: _col6 type: stringexpr: _col7 type: intoutputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: falseGlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr OperatorTime taken: 0.1 secondshive>
  • 40. Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: aTableScan alias: aReduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 0 value expressions:expr: user type: stringexpr: prono type: int pTableScan alias: pReduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 1 value expressions:expr: maker type: stringexpr: price type: intABSTRACT SYNTAX TREE:STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree:TableScan Reduce Output OperatorTableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr OperatorReduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions:expr: _col0 type: stringexpr: _col2 type: intexpr: _col6 type: stringexpr: _col7 type: intoutputColumnNames: _col0, _col1, _col2, _col3File Output Operator compressed: falseGlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr OperatorTime taken: 0.1 secondshive>
  • 41. Appendix: What does Explain show?ABSTRACT SYNTAX TREE:STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree:TableScan Reduce Output OperatorTableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr OperatorMapRedTask (Stage-1/root)MapperTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReducerJoinOperatorJOIN_4≒SelectOperatorSEL_5FileSinkOperatorFS_6MoveTask (Stage-0)Stats Task (Stage-2)