Dashboards & Visualizations

Why am I getting a "Failed to start MapReduce Job" error running a MR2 Job in Hunk 6.2?

techdiverdown
Path Finder

Using Mapr Version 4.01 (MR2) and Hunk 6.2. Configured a mapr provider and virtual index. Simple search works (index=my_virtual_index) but when I add a condition and a MR2 job tries to kick off in MapR I get the following error:

[psb_mapr] Error while running external process, return_code=255. See search.log for more info
[psb_mapr] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_litf-mom.ip.qwest.net_1414779195.4344_0 ] and [ Does not contain a valid host:port authority: HS_IP:10020 ]

0 Karma
1 Solution

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

View solution in original post

techdiverdown
Path Finder

Here is a working configuration for Mapr 4.0.1 using MR2 (YARN). In addition, the mapr client needed to be setup correctly on the Splunk client box using configure.sh.

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.xxx.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.xxx.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
vix.splunk.impersonation = 0
vix.env.MAPREDUCE_USER =

0 Karma

techdiverdown
Path Finder

OK it seems to work now. I changed the HADOOP_HOME setting to this:
/opt/mapr/hadoop/hadoop-0.20.2
also the correct file system url is maprfs:///

rdagan_splunk
Splunk Employee
Splunk Employee

I see that you are using hadoop-2.4.1, but in the classpath you are pointing to hadoop-2.3.0. As far as the error - Does not contain a valid host:port authority - I've seen similar behavior where MR1 jars were in the classpath instead of YARN libraries.

0 Karma

techdiverdown
Path Finder

Good Catch, I updated the classpath to this:

/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

Same error....

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please include some more details about error message (ie search.log) and potentially share the configurations?

Also, are you able to submit MR jobs/YARN apps to the cluster from the command line?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

The error could be related to the following setting:

vix.fs.default.name = maprfs:///

can you try setting it to something like this:

vix.fs.default.name = maprfs://cldb.example.net:7222
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the entire search.log so we can see the entire exception?

0 Karma

techdiverdown
Path Finder

I tried this, same error. I have it set to maprfs://hadoop-dev-00.ngid.centurylink.net:7222 and I still receive the same error.

0 Karma

techdiverdown
Path Finder

1) ON the hadoop nodes, i can run mapr jobs fine.
2) On the splunk node I can see hdfs, and I can stream files from Hunk as well fine.
Here is the config from indexes.conf

[provider:psb_mapr]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle
vix.family = hadoop
vix.fs.default.name = maprfs:///
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/splunk
vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.centurylink.net:8032
vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.centurylink.net:8030
vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/*

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...