Dashboards & Visualizations

Why am I getting a "Failed to start MapReduce Job" error running a MR2 Job in Hunk 6.2?

techdiverdown
Path Finder

Using Mapr Version 4.01 (MR2) and Hunk 6.2. Configured a mapr provider and virtual index. Simple search works (index=my_virtual_index) but when I add a condition and a MR2 job tries to kick off in MapR I get the following error:

[psb_mapr] Error while running external process, return_code=255. See search.log for more info
[psb_mapr] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_litf-mom.ip.qwest.net_1414779195.4344_0 ] and [ Does not contain a valid host:port authority: HS_IP:10020 ]

0 Karma
1 Solution

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

View solution in original post

techdiverdown
Path Finder

Here is a working configuration for Mapr 4.0.1 using MR2 (YARN). In addition, the mapr client needed to be setup correctly on the Splunk client box using configure.sh.

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.xxx.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.xxx.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
vix.splunk.impersonation = 0
vix.env.MAPREDUCE_USER =

0 Karma

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

rdagan_splunk
Splunk Employee
Splunk Employee

I see that you are using hadoop-2.4.1, but in the classpath you are pointing to hadoop-2.3.0. As far as the error - Does not contain a valid host:port authority - I've seen similar behavior where MR1 jars were in the classpath instead of YARN libraries.

0 Karma

techdiverdown
Path Finder

Good Catch, I updated the classpath to this:

/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

Same error....

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please include some more details about error message (ie search.log) and potentially share the configurations?

Also, are you able to submit MR jobs/YARN apps to the cluster from the command line?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

The error could be related to the following setting:

vix.fs.default.name = maprfs:///

can you try setting it to something like this:

vix.fs.default.name = maprfs://cldb.example.net:7222
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the entire search.log so we can see the entire exception?

0 Karma

techdiverdown
Path Finder

I tried this, same error. I have it set to maprfs://hadoop-dev-00.ngid.centurylink.net:7222 and I still receive the same error.

0 Karma

techdiverdown
Path Finder

1) ON the hadoop nodes, i can run mapr jobs fine.
2) On the splunk node I can see hdfs, and I can stream files from Hunk as well fine.
Here is the config from indexes.conf

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.centurylink.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.centurylink.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/*

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...