Jason4Zhu: yarn

Showing posts with label yarn. Show all posts

Friday, July 6, 2018

udf包污染导致tez任务在am端log4j stackoverlow问题排查过程

发现有log4j-over-slf4j,和slf4j-log4j12, 两者会循环引用，导致log4j相关的stackoverflow.

报错如下：


Exception in thread "main" java.lang.StackOverflowError

        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:39)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

        at org.apache.log4j.Category.<init>(Category.java:57)

        at org.apache.log4j.Logger.<init>(Logger.java:37)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

        at org.apache.log4j.Category.<init>(Category.java:57)

        at org.apache.log4j.Logger.<init>(Logger.java:37)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

        at org.apache.log4j.Category.<init>(Category.java:57)

        at org.apache.log4j.Logger.<init>(Logger.java:37)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

        at org.apache.log4j.Category.<init>(Category.java:57)

        at org.apache.log4j.Logger.<init>(Logger.java:37)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

        at org.apache.log4j.Category.<init>(Category.java:57)

        at org.apache.log4j.Logger.<init>(Logger.java:37)

        at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)

        at org.apache.log4j.LogManager.getLogger(LogManager.java:45)

        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)

        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)

exclude掉log4j-over-slf4j和log-to-slf4j即可。

反思：在udf中引用任何包后都要通过`mvn dependency:tree`和`jar -tf ...`命令查看引用的包包含的package和namespace，以防止后期的包污染问题扩散。

REF

1. https://blog.csdn.net/kxcfzyk/article/details/38613861

2. https://www.slf4j.org/codes.html#log4jDelegationLoop

Tuesday, June 12, 2018

HiveOnTez: 包冲突问题排查思路

执行tez报错，观察YARN日志，Driver端报错为：

File does not exist: hdfs://lt-nameservice3.sy/tmp/hive/app/_tez_session_dir/09ff9062-cc3e-4cb3-bc8d-77c275266d94/.tez/application_1528552108294_273009/tez.session.local-resources.pb java.io.FileNotFoundException

此时进入ApplicationMaster的log观察，根本报错内容如下，显然为guava包冲突导致（guava21以上会移除一些method接口不再向前兼容）。


Caused by: java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;

at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)

at java.lang.String.valueOf(String.java:2994)

at java.lang.StringBuilder.append(StringBuilder.java:131)

at org.apache.hadoop.ipc.metrics.RpcMetrics.<init>(RpcMetrics.java:74)

at org.apache.hadoop.ipc.metrics.RpcMetrics.create(RpcMetrics.java:80)

at org.apache.hadoop.ipc.Server.<init>(Server.java:2213)

at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1029)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537)

at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:512)

at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:874)

at org.apache.tez.dag.api.client.DAGClientServer.createServer(DAGClientServer.java:127)

at org.apache.tez.dag.api.client.DAGClientServer.serviceStart(DAGClientServer.java:79)

at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)

at org.apache.tez.dag.app.DAGAppMaster$ServiceWithDependency.start(DAGAppMaster.java:1838)

at org.apache.tez.dag.app.DAGAppMaster$ServiceThread.run(DAGAppMaster.java:1859)

科普下Driver日志和ApplicationMaster日志查看位置：

如图，从YARN页面点击application id进入的页面为Driver页面，红框内容为Driver端报错信息；点击蓝框才会进入ApplicationMaster日志。

此时，在ApplicationMaster中搜索java.class.path可以拿到当前am环境下的所有classpath，从url中获取当前am所在节点，把所有jar包拉出来，找下包版本冲突所在jar包即可。如果为自己jar包里的dependency冲突，或shade，或exclude即可解决。

Monday, May 7, 2018

Spark执行卡住或过慢时从YARN监控页排查思路

在YARN-Stages tab，检查卡住/很慢的stage对应的executor数量，如果executor数量很少，同时对应后面的shuffle read size或者records数量很大（图1），则很可能是因为没有开启spark.dynamicAllocation.enabled。开启配置如下：


spark.dynamicAllocation.enabled               true

spark.dynamicAllocation.initialExecutors            1

spark.dynamicAllocation.minExecutors     1

spark.dynamicAllocation.maxExecutors     300

spark.shuffle.service.enabled    true

如果某个很慢或者卡住的stage对应的task数量为200（图2），则应该注意是spark.sql.shuffle.partitions导致的，此param默认200，可以设置为2011等大值即可。同理，如果出现tasks数量为12，则应该是由于spark.default.parallelism参数。

观察“Executor页面，如果Task Time(GC Time)背景飘红，说明gc时间过长。可以通过启动时添加set spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGC打印gc日志，从executor列表后面的stdout里查看。从优化角度讲，spark推荐使用G1GC。如果G1GC依旧出现上述问题，则可能当前在一个executor里并发的task数过多（task本身是一个算子(lambda)，所以可能使当前的< 输入->输出 >后数据膨胀）。比如executor.memory为12G，executor.cores为4，则一共有4个task并行，每个task平均3g内存。如果减少cores数量，则可以变相提高每个task可使用的内存量。对于当前的case，从gc日志看出，heap space已经动态expand到12G，说明task的确需要消耗很多内存，所以只好调小cores数量从而降低gc time。

在YARN-Jobs tab，可以看到所有stage列表，每项后面有Shuffle Read和Shuffle Write. 前者表示从上一个stage读取的shuffle数据数量，后者表示写出到下一个stage的shuffle数据数量。从这里可以可以粗略估计下当前stage所需的tasks数量。

REFERENCE:

Apache Spark Performance Tuning – Degree of Parallelism

Tuesday, January 27, 2015

Deploy Tez Based On Hadoop-2.2.0

Tez is a computing engine parallel to MapReduce, whose target is to build an application framework which allows for a complex directed-acyclic-graph (DAG) of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The most significant advantage of Tez against MapReduce is that Disk IO will be saved when there's multiple MR tasks which are to be executed in series in Hive. This in-memory computing mechanism is somewhat like Spark.

Now, the procedure of deploying Tez on Hadoop-2.2.0 is shown as below.

--CAVEAT--
1. The Official Deploy Instruction For Tez is absolutely suitable for all release versions of Tez, except the incubating version. Thus, the following deploy instruction is not exactly the same as the official one. (Supplementary may sound more appropriate)
2. In the official document, it says that we have to change hadoop.version to our currently-using version, which is not true after verifying. For instance, there will be ERRORs when execute `mvn clean package ...` provided we change hadoop.version from 2.6.0 to 2.2.0 forcibly in Tez-0.6.0. Consequently, we have to use tez-0.4.1-incubating whose default setting of hadoop.version is 2.2.0.

Ok, now let's get back on track!

Firstly, we have to install JDK6 or later, Maven 3 or later and Protocol Buffers (protoc compiler) 2.5 or later as prerequisite, whose procedure is omitted.

Retrieve tez-0.4.1-incubating from official website and decompress it:

wget http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/tez-0.4.1-incubating-src.tar.gz
tar xzf tez-0.4.1-incubating-src.tar.gz

Check hadoop.version, protobuf.version and 'hardcode' protoc.path as is shown below:

<properties>
    <maven.test.redirectTestOutputToFile>true</maven.test.redirectTestOutputToFile>
    <clover.license>${user.home}/clover.license</clover.license>
    <hadoop.version>2.2.0</hadoop.version>
    <jetty.version>7.6.10.v20130312</jetty.version>
    <distMgmtSnapshotsId>apache.snapshots.https</distMgmtSnapshotsId>
    <distMgmtSnapshotsName>Apache Development Snapshot Repository</distMgmtSnapshotsName>
    <distMgmtSnapshotsUrl>https://repository.apache.org/content/repositories/snapshots</distMgmtSnapshotsUrl>
    <distMgmtStagingId>apache.staging.https</distMgmtStagingId>
    <distMgmtStagingName>Apache Release Distribution Repository</distMgmtStagingName>
    <distMgmtStagingUrl>https://repository.apache.org/service/local/staging/deploy/maven2</distMgmtStagingUrl>
    <failIfNoTests>false</failIfNoTests>
    <protobuf.version>2.5.0</protobuf.version>
    <protoc.path>/usr/local/bin/protoc</protoc.path>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <scm.url>scm:git:https://git-wip-us.apache.org/repos/asf/incubator-tez.git</scm.url>
  </properties>

Execute maven package command.

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

After building, we could find all the compiled jar files in '$TEZ_HOME/tez-dist/target/tez-0.4.1-incubating-full/tez-0.4.1-incubating-full/', assuming which as environment variable $TEZ_JARS.

Find a HDFS path, in which $TEZ_JARS will be uploaded. In my case, '/user/supertool/zhudi/tez-dist' is applied.

hadoop fs -copyFromLocal $TEZ_JARS /user/supertool/zhudi/tez-dist

Create a tez-site.xml in '$HADOOP_HOME/etc/hadoop', add the following content which refers to the HDFS path. Be sure that the HDFS path is in full-path format, that is to say, with 'hdfs://ns1' header.

 <configuration>
     <property>
         <name>tez.lib.uris</name>
        <value>hdfs://ns1/user/supertool/zhudi/tez-dist/tez-0.4.1-incubating-full,hdfs://ns1/user/supertool/zhudi/tez-dist/tez-0.4.1-incubating-full/lib</value>
     </property>
 </configuration>

Eventually, add the following content to ~/.bashrc and `source ~/.bashrc`.

 export TEZ_CONF_DIR=/home/workspace/tez-0.4.1-incubating-src
 export TEZ_JARS=/home/workspace/tez-0.4.1-incubating-src/tez-dist/target/tez-0.4.1-incubating-full/tez-0.4.1-incubating-full
 export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

We could run the tez-examples.jar, which is a MapReduce task, for testing:

hadoop jar /home/workspace/tez-0.4.1-incubating-src/tez-mapreduce-examples/target/tez-mapreduce-examples-0.4.1-incubating.jar orderedwordcount /user/supertool/zhudi/mrTest/input /user/supertool/zhudi/mrTest/output

For hive, simply add the following command before executing HQL.

set hive.execution.engine=tez;

If 'hive.input.format' need to be specified when applying MapReduce Computing Engine, which is default, remember to append the following command when switching to Tez:

set hive.input.format=com.XXX.RuntimeCombineHiveInputFormat;
set hive.tez.input.format=com.XXX.RuntimeCombineHiveInputFormat;

Likewise, if 'mapred.job.queue.name' need to be specified, replace it with 'tez.queue.name'.

One last thing: Only the gateway node, which is going to submit tasks using Tez, in Hadoop cluster needs to be deployed.

Possible ERROR #1:
When using custom UDF in hive/tez, there are times that the exactly same task failed whereas in other times, it succeeded. After looking through the detailed log retrieved by `yarn logs -applicationId <app_id>`, the following ERROR could be found:

java.lang.NoSuchMethodError: org.apache.commons.collections.CollectionUtils.isEmpty(Ljava/util/Collection;)Z
at com.XXX.inputformat.hive.SplitInfo.mergeSplitFiles(SplitInfo.java:86)
at com.XXX.inputformat.hive.RuntimeCombineHiveInputFormat.getSplits(RuntimeCombineHiveInputFormat.java:105)
at org.apache.tez.mapreduce.hadoop.MRHelpers.generateOldSplits(MRHelpers.java:263)
at org.apache.tez.mapreduce.hadoop.MRHelpers.generateInputSplitsToMem(MRHelpers.java:379)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:161)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146)

Then I looked into $HADOOP_HOME/share/hadoop/common/lib/ and $HIVE_HOME/lib, finding that the version of commons-collections.jar is 3.2.1 and 3.1 respectively. Then I found out that there is no 'org.apache.commons.collections.CollectionUtils.isEmpty' method in version 3.1. It is obvious that the culprit is maven dependency confliction. Thus, I replaced the 3.1 with 3.2.1 and all things just worked out fine.

References:
1. Official Deploy Instruction For Tez
2. Deploy Tez on Hadoop 2.2.0 - CSDN

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved
If transfering, please annotate the origin: Jason4Zhu

Wednesday, January 21, 2015

Commonly-Used Commands For Hadoop

YARN Service (ResourceManager + NodeManager):

$HADOOP_HOME/sbin/stop-yarn.sh

$HADOOP_HOME/sbin/start-yarn.sh

HDFS Service (NameNode + DataNode):

$HADOOP_HOME/sbin/stop-dfs.sh

$HADOOP_HOME/sbin/start-dfs.sh

Balancer Service (Do Balance):

$HADOOP_HOME/sbin/stop-balancer.sh

$HADOOP_HOME/sbin/start-balancer.sh

Start DataNode, NameNode, NodeManager, ResourceManager Service respectively:

$HADOOP_HOME/sbin/hadoop-daemon.sh start datanode

$HADOOP_HOME/sbin/hadoop-daemon.sh start namenode

$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager

$HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager

Start zkfc(DFSZKFailoverController) Service:

./sbin/hadoop-daemon.sh start zkfc

./bin/hdfs zkfc   (which will show detailed launching information for debugging)

ZooKeeper(QuorumPeerMain) Service:

$ZK_HOME/bin/zkServer.sh stop

$ZK_HOME/bin/zkServer.sh start

$ZK_HOME/bin/zkServer.sh restart

$ZK_HOME/bin/zkServer.sh status

Start JournalNode Service:

$HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode

NameNode-HA-Related Operation:

Check NameNode Status(active/standby):    hdfs haadmin -getServiceState <serviceId>

Transfer from standby to active manually:  hdfs haadmin -transitionToActive <serviceId>

List Current Undone MapReduce Task:

mapred job -list

Kill MapReduce Task:

mapred job -kill <jobid>

List Current Undone YARN Task:

yarn application -list

Kill YARN Task:

yarn application -kill <YARN_Application_id>

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Thursday, December 4, 2014

A Record On The Process Of Adding DataNode To Hadoop Cluster

Procedure of checking and configuring linux node

1. Check on hostname

To check whether current hostname is exactly what it should be.

hostname

2. Check on the type of file system

To make sure it is the same as all the other nodes in Hadoop cluster.

df -T

3. Change owner of all data disk to hadoop user

chown -R hadoop:hadoop /home/data*

4. Check and backup disk mount info

We can simply execute the output of mount.sh in case some mounted filepaths are lost upon restart of node.

su root

--mount.sh--
n=1 ; for i in a b c d e f g h i j k l ; do a=`/sbin/blkid -s UUID | grep ^/dev/sd$i | awk '{print $2}'` ; echo mount $a /home/data$n ; n=`echo $n+1|bc` ; done

> bash mount.sh
mount UUID="09c42017-9308-45c3-9509-e77a2e99c732" /home/data1
mount UUID="72461da2-b0c0-432a-9b65-0ac5bc5bc69a" /home/data2
mount UUID="6d447f43-b2db-4f69-a3b2-a4f69f2544ea" /home/data3
mount UUID="37ca4fb8-377c-493d-9a4c-825f1500ae52" /home/data4
mount UUID="53334c93-13ff-41f5-8688-07023bd6f11a" /home/data5
mount UUID="10fa31f7-9c29-4190-8ecd-ec893d59634c" /home/data6
mount UUID="fe28b8dd-ff3b-49d9-87c6-6eee9f389966" /home/data7
mount UUID="5201d24b-9310-4cff-b3ad-5b09e47780a5" /home/data8
mount UUID="d3b85455-8b94-4817-b43e-69481f9c13c4" /home/data9
mount UUID="6f2630f1-7cfe-4cac-b52d-557f46779539" /home/data10
mount UUID="bafc742d-1477-439a-ade4-29711c5db840" /home/data11
mount UUID="bf6e36d8-1410-4547-853c-f541c9a07e52" /home/data12

We can append the output of mount.sh into /etc/rc.local, in which way the mount command will be invoked automatically every time node starts up.

5. Check on the version of operating system

lsb_release -a

6. Optimizing TCP parameters in sysctl

Append the following content to /etc/sysctl.conf.

fs.file-max = 800000
net.core.rmem_default = 12697600
net.core.wmem_default = 12697600
net.core.rmem_max = 873800000
net.core.wmem_max = 655360000
net.ipv4.tcp_rmem = 8192 262144 4096000
net.ipv4.tcp_wmem = 4096 262144 4096000
net.ipv4.tcp_mem = 196608 262144 3145728
net.ipv4.tcp_max_orphans = 300000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_max_syn_backlog = 100000
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp.keepalive_time = 1200
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 1500
net.core.somaxconn=32768
vm.swappiness=0

Issue the following command to validate the previous setting.

/sbin/sysctl -p

# List all cuurent parameters to double-check
/sbin/sysctl -a

7. Max connections to a file

Check on current max connections to a file:

ulimit -n

Appending the following content to /etc/security/limits.confs so as to change it to 100000.

*      soft    nofile  100000
*      hard    nofile  100000

8. Check and sync /etc/hosts

Checkout a current /etc/hosts file from one of the existing nodes in Hadoop cluster, namely canonical node. Append the newly-added node's host in this file and synchronize it to all nodes.

9. Revise locale to en_US.UTF-8

Append in /etc/profile will just do the work.

export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_COLLATE=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_TELEPHONE=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LC_IDENTIFICATION=en_US.UTF-8
export LC_ALL=en_US.UTF-8

10. Transfer .ssh directory to newly-added node

Since all nodes are sharing one set of .ssh directory, in this way, no-auth ssh among all the nodes is simple to achieve: Append id_rsa.pub in current authorized_keys and spread .ssh directory to all nodes.

11. Deploy Java & Hadoop environment

Copy java directory from canonical node to newly-added node, append the following environment variable to /etc/profile:

export JAVA_HOME=/usr/java/jdk1.7.0_11
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

Likewise, do the same to hadoop directory:

export HADOOP_HOME=/home/workspace/hadoop
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export PATH=$PATH:$HADOOP_HOME/bin

At the same time, mkdir and chown the paths configured by dfs.datanode.data.dir parameter in hdfs-site.xml:

su hadoop
for i in {1..12}
do
  mkdir -p /home/data$i/hdfsdir/data
  chmod -R 755 /home/data$i/hdfsdir
done

Lastly, append newly-added host to etc/hadoop/slaves and synchronize to all nodes.

su hadoop
for i in $(cat $HADOOP_HOME/etc/hadoop/slaves  | grep -v "#")
do
 echo '';
 echo $i;
 scp $HADOOP_HOME/etc/hadoop/slaves hadoop@$i:/home/workspace/hadoop/etc/hadoop/;
done

12. Open iptables

This is well-explained in another post: Configure Firewall In Iptables For Hadoop Cluster.

13. Launch Hadoop services

Eventually, we simple invoke the following two commands so as to start DataNode and NodeManager service.

su hadoop
$HADOOP_HOME/sbin/hadoop-daemon.sh start datanode
$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager

After that, check whether the above two processes exist or not and look through the corresponding logs in $HADOOP_HOME/logs directory for the purpose of double-check.

The python script, which is the implementation of all the procedures listed above, can be found in my project in github.

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved
If transfering, please annotate the origin: Jason4Zhu

Tuesday, November 18, 2014

The Script Template For Checking On And Synchronizing Configuration Files To All Nodes In Hadoop

Here's the skeleton of shell script template:

for  i  in  $(cat  $HADOOP_HOME/etc/hadoop/slaves  |  grep  -v  "#")
do
  #......
done

It takes advantages of slaves file, in which all DataNodes are listed. In addition, we should append all the other nodes, say NameNode, in Hadoop exhaustively.

There are two scenarios that I commonly use the above shell in:

#1. Synchronizing Configuration Files

for i in $(cat $HADOOP_HOME/etc/hadoop/slaves | grep -v "#")
do
 echo '';
 echo $i;
 rsync -r --delete $HADOOP_HOME/etc/hadoop/ hadoop@$i:/home/supertool/hadoop-2.2.0/etc/hadoop/;
done

#2. Checking On Specific Processes

Every so often, we have to check out whether some specific processes have started or been killed on all related nodes after we executing commands like `start-yarn.sh`, `hadoop-daemon.sh start datanode`, `yarn-daemon.sh stop nodemanager`, etc. It would be time-saver if the script is applied.

for i in $(cat etc/hadoop/slaves | grep -v "#")
do
  echo ''
  echo $i
  ssh supertool@$i "/usr/java/jdk1.7.0_11/bin/jps | grep -i NodeManager"
done

The Way To Set Global Java Opts In Hadoop Without Being Overlapped At Runtime

I intend to set the default java GC collector to '-XX:+UseSerialGC' for all YARN applications without being overlapped at runtime.

After setting it to parameter 'mapreduce.map.java.opts' in mapred-site.xml as below and synchronizing it to all nodes, it will be overlapped when configuring the same parameter at runtime, for instance: `hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi -Dmapreduce.map.java.opts="-Xmx256M" 4 1000`.

//mapred-site.xml
<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx1024M  -XX:+UseSerialGC</value>
</property>

There are two commands we can issue on a DataNode to check out whether the parameter '-XX:+UseSerialGC' takes into effect or not. Whichever command we are using, we should only focus on 'YarnChild' processes.

//--#1--
/usr/java/jdk1.7.0_11/bin/jps  -mlv  |  grep  -i  gc  
//--#2--
/usr/java/jdk1.7.0_11/bin/jps  |  awk  '{if($2=="YarnChild")  print  $1}'  |  xargs  /usr/java/jdk1.7.0_11/bin/jinfo  |  grep  -i  gc'

Then I tried to configure it in hadoop-env.sh, neither 'HADOOP_OPTS' nor 'HADOOP_CLIENT_OPTS' did it work:

//hadoop-env.sh
export  HADOOP_OPTS="$HADOOP_OPTS  -Dmapreduce.map.java.opts='-XX:+UseSerialGC'"
export  HADOOP_CLIENT_OPTS="-XX:+UseSerialGC  $HADOOP_CLIENT_OPTS"

Finally, I found a parameter which is not described in the official document of mapred-default.xml: 'mapreduce.admin.map.child.java.opts' (The corresponding one to reduce is 'mapreduce.admin.reduce.child.java.opts'). After setting this parameter in mapred-site.xml and synchronizing the file to all nodes, it works fine and definitely will not be overrided since we merely override parameter 'mapreduce.map.java.opts' at runtime.

//mapred-site.xml
<property>
    <name>mapreduce.admin.map.child.java.opts</name>
    <value>-XX:+UseSerialGC</value>
</property>

Consequencely, when we'are going to set some global java opts, we can set them in 'mapreduce.admin.map.child.java.opts' and 'mapreduce.admin.reduce.child.java.opts' in mapred-site.xml.

References:

Chapter 3. Setting Up the Hadoop Configuration - Hortonworks Data Platform

Wednesday, November 12, 2014

Yarn Log Aggregation Configuration In Hadoop

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by YARN. It will aggregate and upload finished container or task's log to HDFS. The related configurations are as follows:

     name
     value
     description
    
     yarn.log-aggregation-enable
     false
     Whether to enable log aggregation
    
     yarn.log-aggregation.retain-seconds
     -1
     How long to keep aggregation logs before deleting them. -1 disables. Be careful set this too small and you will spam the name node.
    
      yarn.log-aggregation.retain-check-interval-seconds
      -1
      How long to wait between aggregated log retention checks. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful set this too small and you will spam the name node.
     
       yarn.nodemanager.remote-app-log-dir
       /tmp/logs
       Where to aggregate logs to.
      
        yarn.nodemanager.remote-app-log-dir-suffix
        logs
        The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
       
Logs from NodeManager can be seen from the YARN monitor webpage:

References:
Hadoop-Yarn-Configurations: Log-Aggregation - Dong

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Dig Into JobHistory Server Of MapReduce In Hadoop2

JobHistory Server is a standalone module in hadoop2, and will be started or stopped separately apart from start-all.sh and stop-all.sh. It serves as the job history logger, which will log down all the info in configured filesystem from the birth of a MapReduce task to its death.

JobHistory logs can be found from the page shown below:

Configuration & Command

There are two arguments related to the startup and monitor-page of jobhistory:

<property>
    <name>mapreduce.jobhistory.address</name>
    <value>host:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>host:19888</value>
</property>

And another three arguments related to the storaging path of job history files:

name	value	description
yarn.app.mapreduce.am.staging-dir	/tmp/hadoop-yarn/staging	The staging dir used while submitting jobs.
mapreduce.jobhistory.intermediate-done-dir	${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
mapreduce.jobhistory.done-dir	${yarn.app.mapreduce.am.staging-dir}/history/done

We'd better `mkdir` and `chmod` of the above three directories ourselves.

hadoop  fs  -mkdir  -p  /tmp/hadoop-yarn/staging/history/done_intermediate
hadoop  fs  -chmod  -R  777  /tmp/hadoop-yarn/staging/history/done_intermediate
hadoop  fs  -mkdir  -p  /tmp/hadoop-yarn/staging/history/done
hadoop  fs  -chmod  -R  777  /tmp/hadoop-yarn/staging/history/done

The command to start and stop JobHistory Server is quite easy:

${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh  start historyserver
${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh  stop  historyserver

Procedure of Logging in History Server

When a MapReduce application starts, history server will write logs in ${yarn.app.mapreduce.am.staging-dir}/${current_user}/.staging/job_XXXXX_XXX, in which there are three files: .jhist, .summary and .xml, representing job history, job summary and configuration file, respectively.

When this application is finished/killed/failed, the log info will be copied to ${mapreduce.jobhistory.intermediate-done-dir}/${current_user}. This procedure is implemented at "org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler".

After copied to ${mapreduce.jobhistory.intermediate-done-dir}/${current_user},  The job history file will eventually be moved to ${mapreduce.jobhistory.done-dir} by "org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager".

All logs for this procedure will be recorded to ${HADOOP_HOME}/logs/userlogs, which is configured by argument 'yarn.nodemanager.log-dirs', in each NodeManager node provided yarn-log-aggregation is not enabled.

NullPointerException With History Server

We're facing a problem that some of our MapReduce tasks, especially for long time-consuming tasks, will throw NullPointerException when the job completes, the stacktrace is as follows:

14/07/22 06:37:11 INFO mapreduce.Job:  map 100% reduce 98%
14/07/22 06:37:44 INFO mapreduce.Job:  map 100% reduce 99%
14/07/22 06:38:30 INFO mapreduce.Job:  map 100% reduce 100%
14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
server
14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
server
14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
server
14/07/22 06:39:02 ERROR security.UserGroupInformation:
PriviledgedActionException as: rohitsarewar (auth:SIMPLE)
cause:java.io.IOException:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
        at
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
        at
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
Exception in thread "main" java.io.IOException:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
        at
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
        at
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
        at
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330)
        at
org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382)
        at
org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529)
        at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
        at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at
org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665)
        at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
        at com.bigdata.mapreduce.esc.escDriver.main(escDriver.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): j
        at
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolH
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBSer
        at
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(P
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine
        at com.sun.proxy.$Proxy12.getTaskAttemptCompletionEvents(Unknown
Source)
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClie
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDeleg
        ... 16 more

It seems like the remote history server object can not be found after the MapReduce job is done and try to invoke method on that object via IPC. Finally, we solve this problem by changing argument 'dfs.client.socket-timeout' for JobHistory service to '3600000', which is 1 hour. Because of high pressure on our HDFS cluster, there will be some delay or hanging when sending request to HDFS, thus we have to set this argument separately for JobHistory service. 

Notice that argument 'dfs.client.socket-timeout' in hdfs-site.xml for start/stop-dfs.sh should be relatively lower than '3600000', say '60000' or '180000'. Since a map has to wait exactly that much time in order to try again provided it fails in this time. 

Thus the procedure should be:

Set 'dfs.client.socket-timeout' in hdfs-site.xml to 3600000.
start JobHistory server.
Set 'dfs.client.socket-timeout' in hdfs-site.xml back to 180000.
start-dfs.sh

If this shared configuration file is too obscure and ill-managed, we could specify the config file for job historyserver when starting it up via:

./mr-jobhistory-daemon.sh --config /home/supertool/hadoop-2.2.0/etc/job_history_server_config/  start historyserver

References: