Dashboards & Visualizations

Why am I getting a "Failed to start MapReduce Job" error running a MR2 Job in Hunk 6.2?

techdiverdown
Path Finder

Using Mapr Version 4.01 (MR2) and Hunk 6.2. Configured a mapr provider and virtual index. Simple search works (index=my_virtual_index) but when I add a condition and a MR2 job tries to kick off in MapR I get the following error:

[psb_mapr] Error while running external process, return_code=255. See search.log for more info
[psb_mapr] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_litf-mom.ip.qwest.net_1414779195.4344_0 ] and [ Does not contain a valid host:port authority: HS_IP:10020 ]

0 Karma
1 Solution

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

View solution in original post

techdiverdown
Path Finder

Here is a working configuration for Mapr 4.0.1 using MR2 (YARN). In addition, the mapr client needed to be setup correctly on the Splunk client box using configure.sh.

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.xxx.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.xxx.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
vix.splunk.impersonation = 0
vix.env.MAPREDUCE_USER =

0 Karma

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

rdagan_splunk
Splunk Employee
Splunk Employee

I see that you are using hadoop-2.4.1, but in the classpath you are pointing to hadoop-2.3.0. As far as the error - Does not contain a valid host:port authority - I've seen similar behavior where MR1 jars were in the classpath instead of YARN libraries.

0 Karma

techdiverdown
Path Finder

Good Catch, I updated the classpath to this:

/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

Same error....

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please include some more details about error message (ie search.log) and potentially share the configurations?

Also, are you able to submit MR jobs/YARN apps to the cluster from the command line?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

The error could be related to the following setting:

vix.fs.default.name = maprfs:///

can you try setting it to something like this:

vix.fs.default.name = maprfs://cldb.example.net:7222
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the entire search.log so we can see the entire exception?

0 Karma

techdiverdown
Path Finder

I tried this, same error. I have it set to maprfs://hadoop-dev-00.ngid.centurylink.net:7222 and I still receive the same error.

0 Karma

techdiverdown
Path Finder

1) ON the hadoop nodes, i can run mapr jobs fine.
2) On the splunk node I can see hdfs, and I can stream files from Hunk as well fine.
Here is the config from indexes.conf

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.centurylink.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.centurylink.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/*

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...