- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
When setting up the Virtual Indexes -> provider for Hunk, I am a bit confused about the configuration options.
Hadoop version: Hadoop 2.x Yarn
Job tracker (-> ? In 2 there is not Job Tracker... I used the Resource Manager)
master1:8032
File system
hdfs://master1:8020
Then added the following two settings:
vix.mapreduce.framework.name=yarn
vix.yarn.resourcemanager.address=master1:8032
vix.yarn.resourcemanager.scheduler.address=master1:8030
Those values match with the yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master1:8088</value>
</property>
Which generates the following error:
[Stor] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_master1_1389788887.59_0 ] and [ Unknown rpc kind RPC_WRITABLE ]
If I enable MapReduce v1, and set the address of the JobTracker in the "Job Tracker" field:
master1:8021
Then it works but uses MR v1 and not YARN.
What am I missing?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

seyvet,
It would be unusual to see a Hadoop Cluster running both YARN and MRv1 simultaneously. What version of Hadoop are you running? Are you running a commercial distribution, if so which and which version?
If you're running YARN, leave the Job Tracker field blank (this is suboptimal, we're fixing it in the next version). The docs here should help: http://docs.splunk.com/Documentation/Hunk/6.0/Hunk/SetupanHDFSprovider. Other than that, your settings look like they should work. Can you send us a pastebin of your search.log (Job->Inspect Job->scroll to bottom and click search.log).
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

seyvet,
It would be unusual to see a Hadoop Cluster running both YARN and MRv1 simultaneously. What version of Hadoop are you running? Are you running a commercial distribution, if so which and which version?
If you're running YARN, leave the Job Tracker field blank (this is suboptimal, we're fixing it in the next version). The docs here should help: http://docs.splunk.com/Documentation/Hunk/6.0/Hunk/SetupanHDFSprovider. Other than that, your settings look like they should work. Can you send us a pastebin of your search.log (Job->Inspect Job->scroll to bottom and click search.log).
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. Thanks for pointing to the removal of the field.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You can delete it and it will be recreated. Glad everything is working for you!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK I got it... The first ran with MR v1 created a /tmp/splunk on each datanode and with access rights set as mapred:mapred. As a result, user "yarn" could not write there... So I 777 this folder on all nodes.
Seems to work now.
If I delete the /tmp/splunk will it be re-created automatically now? ie even if I have ran a few searches?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
search.log http://pastebin.com/PGCAPGTW
(previous one was removed)
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional info, splunk creates temp info from splunkMR/dispatch:
Cannot create username mapping file: /tmp/splunk/master1/splunk/etc/users/users.ini: Permission denied
Cannot open file=/tmp/splunk/master1/splunk/etc/users/users.ini for parsing: Permission denied
Error opening username mapping file: /tmp/splunk/master1/splunk/etc/users/users.ini
Cannot initialize: /tmp/splunk/master1/splunk/etc/system/metadata/local.meta: Permission denied
/tmp/splunk/master1/splunk/var/run/splunk/dispatch/SplunkMR_attempt_1389874731460_0008_m_000000_0 was not created before dispatch process was created
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok. So my bad. I checked the setup and adding the MR v1 service overrode YARN setup. I fixed that.
Then removing the Job Tracker field value did trigger YARN MR (2.0) which is good.
Now I got:
Stor] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: Error while waiting for MapReduce job to complete, job_id=[!http://master1:8088/proxy/application_1389874731460_0002/ job_1389874731460_0002], state=FAILED, reason=
search.log: http://pastebin.com/vStgmcsH
or ERROR ChunkedOutputStreamReader - Invalid header line="3681388,1360210..."
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for answer.
I removed the content of field: Job Tracker. Then the error became:
[Stor] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_master1_1389860452.143_0 ] and [ Does not contain a valid host:port authority: ]
search.log: http://pastebin.com/PLxYCYNw
As to YARN vs MR coexisting, I use Cloudera Standard 4.7.3 to manage a Hadoop cluster of 32 nodes for a lab, CDH allows for deploying MR V1 in parallel with YARN 2.
