All Apps and Add-ons

Hunk has problems running MapReduce jobs against EMR v2

Ledion_Bitincka
Splunk Employee
Splunk Employee

I've setup Hunk (6.1) against EMR v2, see contents of indexes.conf below. Streaming searches work perfectly and I can see data come back to Hunk, however as soon as I try to run a reporting search, e.g. "index=emr2 | stats count" I keep getting the following error:

 [emr2] IOException - Error while waiting for MapReduce job to complete, job_id=[!http://ip-10-34-139-234.ec2.internal:9026/cluster/app/application_1399415419726_0005 job_1399415419726_0005], state=FAILED, reason=Application application_1399415419726_0005 failed 2 times due to AM Container for appattempt_1399415419726_0005_000002 exited with exitCode: 1 due to: Exception from container-launch: 

and the container logs look like this:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.CompositeService
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more

indexes.conf
[provider:emr2]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/hadoop/apache/hadoop-2.2.0/
vix.env.JAVA_HOME = /usr/
vix.family = hadoop
vix.fs.default.name = hdfs://10.34.139.234:9000/
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /hunk/workdir
vix.yarn.resourcemanager.address = 10.34.139.234:9022
vix.yarn.resourcemanager.scheduler.address = 10.34.139.234:9024

[emr2]
vix.input.1.path = /hunk/data/...
vix.provider = emr2
Tags (4)
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

The root cause of this problem is the default setting for vix.yarn.application.classpath in /opt/hunk/etc/system/default/indexes.conf

[provider-family:hadoop]
....
vix.yarn.application.classpath     = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

This setting is correct in other Hadoop distros, however the EMR distro seems to use a different value for this setting - you can determine the correct value for it by looking for the value of yarn.application.classpath in the running config of your cluster, http://<resource-manager-host>:9026/conf - in my case, setting vix.yarn.application.classpath as follows got things working!!

[provider:emr2]
...
vix.yarn.application.classpath = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,/usr/share/aws/emr/emr-fs/lib/*,/usr/share/aws/emr/lib/*

another possible solution that might work in your env would be to unset the yarn.application.classpath for your provider and let the default be used

[provider:emr2]
...
vix.yarn.application.classpath = 

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

The root cause of this problem is the default setting for vix.yarn.application.classpath in /opt/hunk/etc/system/default/indexes.conf

[provider-family:hadoop]
....
vix.yarn.application.classpath     = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

This setting is correct in other Hadoop distros, however the EMR distro seems to use a different value for this setting - you can determine the correct value for it by looking for the value of yarn.application.classpath in the running config of your cluster, http://<resource-manager-host>:9026/conf - in my case, setting vix.yarn.application.classpath as follows got things working!!

[provider:emr2]
...
vix.yarn.application.classpath = $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,/usr/share/aws/emr/emr-fs/lib/*,/usr/share/aws/emr/lib/*

another possible solution that might work in your env would be to unset the yarn.application.classpath for your provider and let the default be used

[provider:emr2]
...
vix.yarn.application.classpath = 

hortonew
Builder

Thank you so much for this. It's 2020 and this helped solve my issue. If you're using EMR, make sure to SSH to your master node, cd /usr/lib/hadoop-yarn/ and look at yarn-site.xml for yarn.application.classpath and use what's in there in your hadoop client's yarn-site.xml. Mine turned out to be:

$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...