Archive

Why am I getting a "Failed to start MapReduce Job" error running a MR2 Job in Hunk 6.2?

techdiverdown
Path Finder

Using Mapr Version 4.01 (MR2) and Hunk 6.2. Configured a mapr provider and virtual index. Simple search works (index=my_virtual_index) but when I add a condition and a MR2 job tries to kick off in MapR I get the following error:

[psb_mapr] Error while running external process, return_code=255. See search.log for more info
[psb_mapr] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_litf-mom.ip.qwest.net_1414779195.4344_0 ] and [ Does not contain a valid host:port authority: HS_IP:10020 ]

0 Karma
1 Solution

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

View solution in original post

techdiverdown
Path Finder

Here is a working configuration for Mapr 4.0.1 using MR2 (YARN). In addition, the mapr client needed to be setup correctly on the Splunk client box using configure.sh.

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.xxx.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.xxx.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
vix.splunk.impersonation = 0
vix.env.MAPREDUCE_USER =

0 Karma

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

I see that you are using hadoop-2.4.1, but in the classpath you are pointing to hadoop-2.3.0. As far as the error - Does not contain a valid host:port authority - I've seen similar behavior where MR1 jars were in the classpath instead of YARN libraries.

0 Karma

techdiverdown
Path Finder

Good Catch, I updated the classpath to this:

/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

Same error....

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please include some more details about error message (ie search.log) and potentially share the configurations?

Also, are you able to submit MR jobs/YARN apps to the cluster from the command line?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

The error could be related to the following setting:

vix.fs.default.name = maprfs:///

can you try setting it to something like this:

vix.fs.default.name = maprfs://cldb.example.net:7222
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the entire search.log so we can see the entire exception?

0 Karma

techdiverdown
Path Finder

I tried this, same error. I have it set to maprfs://hadoop-dev-00.ngid.centurylink.net:7222 and I still receive the same error.

0 Karma

techdiverdown
Path Finder

1) ON the hadoop nodes, i can run mapr jobs fine.
2) On the splunk node I can see hdfs, and I can stream files from Hunk as well fine.
Here is the config from indexes.conf

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.centurylink.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.centurylink.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/*

0 Karma