All Apps and Add-ons

Hunk has problems running MapReduce jobs against EMR v2

Ledion_Bitincka
Splunk Employee
Splunk Employee

I've setup Hunk (6.1) against EMR v2, see contents of indexes.conf below. Streaming searches work perfectly and I can see data come back to Hunk, however as soon as I try to run a reporting search, e.g. "index=emr2 | stats count" I keep getting the following error:

 [emr2] IOException - Error while waiting for MapReduce job to complete, job_id=[!http://ip-10-34-139-234.ec2.internal:9026/cluster/app/application_1399415419726_0005 job_1399415419726_0005], state=FAILED, reason=Application application_1399415419726_0005 failed 2 times due to AM Container for appattempt_1399415419726_0005_000002 exited with exitCode: 1 due to: Exception from container-launch: 

and the container logs look like this:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.CompositeService
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more

indexes.conf
[provider:emr2]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/hadoop/apache/hadoop-2.2.0/
vix.env.JAVA_HOME = /usr/
vix.family = hadoop
vix.fs.default.name = hdfs://10.34.139.234:9000/
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /hunk/workdir
vix.yarn.resourcemanager.address = 10.34.139.234:9022
vix.yarn.resourcemanager.scheduler.address = 10.34.139.234:9024

[emr2]
vix.input.1.path = /hunk/data/...
vix.provider = emr2
Tags (4)
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

The root cause of this problem is the default setting for vix.yarn.application.classpath in /opt/hunk/etc/system/default/indexes.conf

[provider-family:hadoop]
....
vix.yarn.application.classpath     = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

This setting is correct in other Hadoop distros, however the EMR distro seems to use a different value for this setting - you can determine the correct value for it by looking for the value of yarn.application.classpath in the running config of your cluster, http://<resource-manager-host>:9026/conf - in my case, setting vix.yarn.application.classpath as follows got things working!!

[provider:emr2]
...
vix.yarn.application.classpath = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,/usr/share/aws/emr/emr-fs/lib/*,/usr/share/aws/emr/lib/*

another possible solution that might work in your env would be to unset the yarn.application.classpath for your provider and let the default be used

[provider:emr2]
...
vix.yarn.application.classpath = 

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

The root cause of this problem is the default setting for vix.yarn.application.classpath in /opt/hunk/etc/system/default/indexes.conf

[provider-family:hadoop]
....
vix.yarn.application.classpath     = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

This setting is correct in other Hadoop distros, however the EMR distro seems to use a different value for this setting - you can determine the correct value for it by looking for the value of yarn.application.classpath in the running config of your cluster, http://<resource-manager-host>:9026/conf - in my case, setting vix.yarn.application.classpath as follows got things working!!

[provider:emr2]
...
vix.yarn.application.classpath = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,/usr/share/aws/emr/emr-fs/lib/*,/usr/share/aws/emr/lib/*

another possible solution that might work in your env would be to unset the yarn.application.classpath for your provider and let the default be used

[provider:emr2]
...
vix.yarn.application.classpath = 

hortonew
Builder

Thank you so much for this. It's 2020 and this helped solve my issue. If you're using EMR, make sure to SSH to your master node, cd /usr/lib/hadoop-yarn/ and look at yarn-site.xml for yarn.application.classpath and use what's in there in your hadoop client's yarn-site.xml. Mine turned out to be:

$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...