<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hunk and Cloudera CDH4.4 (hdfs and mapred V1) 2 Problems? in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147951#M72216</link>
    <description>&lt;P&gt;Ledion, thanks for your fast answer:&lt;/P&gt;

&lt;P&gt;Problem1:&lt;/P&gt;

&lt;P&gt;hdfs Superuser is "hdfs"&lt;BR /&gt;
mapred User ist "mapred" with full access (rwx) to the hdfs-directories!&lt;/P&gt;

&lt;P&gt;I want to run splunkd as user "splunk", but I need to run Mapred-Jobs as user "mapred". But after setting &lt;STRONG&gt;vix.env.MAPREDUCE_USER = mapred&lt;/STRONG&gt; I loose my access to my hdfs-data! Even if splunkd is actually running as "mapred"!&lt;/P&gt;

&lt;P&gt;Problem2:&lt;/P&gt;

&lt;P&gt;I have a problem to decide if a path in the configuration is hdfs- or unix-fs!&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;vix.splunk.home.hdfs = /user/splunk/&lt;HUNK-SERVER&gt;  &lt;/HUNK-SERVER&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;is hdfs:&lt;/P&gt;

&lt;P&gt;In my case it is (hdfs dfs -ls /user/splunk):&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;drwxrwxrwt   - mapred supergroup          0 2013-11-14 13:46 /user/splunk&lt;BR /&gt;&lt;BR /&gt;
drwxrwxrwt   - mapred supergroup          0 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-19 11:32 /user/splunk/&lt;HUNK-SERVER&gt;/bundles&lt;BR /&gt;&lt;BR /&gt;
...&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-19 13:32 /user/splunk/&lt;HUNK-SERVER&gt;/dispatch&lt;BR /&gt;&lt;BR /&gt;
...&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;/packages&lt;BR /&gt;&lt;BR /&gt;
-rw-r--r--   3 mapred supergroup   77318285 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;/packages/splunk-6.0-184175-Linux-x86_64.tgz&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;But what about &lt;STRONG&gt;vix.splunk.home.datanode&lt;/STRONG&gt;? I set this to "/opt/splunk" and created that dir in unix-fs on all nodes of the cluster and the hunk-node!&lt;/P&gt;

&lt;P&gt;root@&lt;HUNK-NODE&gt;:~# ls -ld /opt/splunk/&lt;BR /&gt;&lt;BR /&gt;
drwxrwxr-x 10 mapred hadoop 4096 Nov 18 14:46 /opt/splunk/&lt;/HUNK-NODE&gt;&lt;/P&gt;

&lt;P&gt;Do I need to create an hdfs-dir "/opt/splunk"? And an unix-fs "/user/splunk".&lt;/P&gt;

&lt;P&gt;Another observation:&lt;/P&gt;

&lt;P&gt;The task is to read more than 30.000.000 lines from 1913 Datafiles. This means that 1913 Map-Jobs will be generated. Our cluster is running 50 concurrent map jobs. The jobtracker shows that when the first 100 jobs have failed, all other jobs are killed and the jobs has failed.&lt;/P&gt;

&lt;P&gt;I tried to increase "vix.mapred.job.reuse.jvm.num.tasks" to 2000, but that didn't change anything ((Maybe this is a parameter I have to change in the Cloudera-Env).&lt;/P&gt;

&lt;P&gt;And here are the error-msgs from the search.log:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;11-20-2013 12:03:54.718 WARN  ERP.tdHunk -  SplunkMR$SplunkBaseMapper - Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://td-db02.intern.trusteddialog.de:9000/&lt;EM&gt;Path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;, regex=.avro$.&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk -  VirtualIndex - File meets the search criteria,. Will consider it, path=hdfs://td-db02.intern.trusteddialog.de:9000/&lt;EM&gt;Path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk -  Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #76&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #77&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #76&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #77&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  OutputProcessor - received: hdfs://td-db02.intern.trusteddialog.de:9000/&lt;PATH to="" data="https://community.splunk.com/"&gt;/mtdscan0.freenet.de-tdchecks.2013-11-19T03+05Z0000.gz:0+23669&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  DFSInputStream - newInfo = LocatedBlocks{&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    fileLength=1255240&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    underConstruction=false&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    blocks=[LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.102:50010]}]&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    lastLocatedBlock=LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.102:50010]}&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    isLastBlockComplete=true}&lt;BR /&gt;&lt;BR /&gt;
...&lt;/PATH&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;After 4 occurances of that sequence another Logging follows:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;11-20-2013 12:03:57.381 DEBUG ERP.tdHunk -  DFSInputStream - Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/XXX.YYY.ZZZ.90,port=50010,localport=43147])&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -  java.io.EOFException: Premature EOF: no length prefix available&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1084)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at java.io.DataInputStream.read(DataInputStream.java:149)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:157)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:141)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at java.io.InputStream.read(InputStream.java:101)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:147)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.input.SplunkLineRecordReader.nextKeyValue(SplunkLineRecordReader.java:40)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:562)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:520)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.OutputProcessor.outputStreaming(OutputProcessor.java:216)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.OutputProcessor.run(OutputProcessor.java:167)&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Here are the errors from the dispatch log (hdfs://user/splunk/td-ha01/dispatch/1384949253.250/0/_logs/history/job_201311201109_0009_1384949219885_mapred_SPLK_td-&lt;STRONG&gt;Hunknode&lt;/STRONG&gt;_1384949253.25:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Task TASKID="task_201311201109_0009_m_000041" TASK_TYPE="MAP" START_TIME="1384949222512" SPLITS="/default-rack/&lt;EM&gt;clusternode01&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-&lt;BR /&gt;
rack/&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode02&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" START_TIME="1384949277094" TRACK&lt;BR /&gt;
ER_NAME="tracker_&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;:localhost/127.0.0.1:44134" HTTP_PORT="50060" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" TASK_STATUS="FAILED" FINISH_TIME="1384949280971" HOSTNAME="&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" ERROR="java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;&lt;BR /&gt;
        at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at javax.security.auth.Subject.doAs(Subject.java:416)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child.main(Child.java:262)&lt;BR /&gt;
" .&lt;BR /&gt;&lt;BR /&gt;
Task TASKID="task_201311201109_0009_m_000042" TASK_TYPE="MAP" START_TIME="1384949226178" SPLITS="/default-rack/&lt;EM&gt;clusternode02&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode00&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" START_TIME="1384949277093" TRACKER_NAME="tracker_&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;:localhost/127.0.0.1:44134" HTTP_PORT="50060" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000"   TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1384949282477" HOSTNAME="&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;"   ERROR="java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
...&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;And from the jobtracker logs I get:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;2013-11-20 11:42:40,074 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,242 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,243 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,885 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,956 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6328edf2&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,533 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://&lt;STRONG&gt;Namenode&lt;/STRONG&gt;:9000/&lt;EM&gt;path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;:0+8438254&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,590 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &amp;amp; initialized native-zlib library&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,591 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz]&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,630 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://&lt;STRONG&gt;Namenode&lt;/STRONG&gt;:9000/&lt;EM&gt;path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;, regex=.avro$.&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,639 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,641 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,642 WARN org.apache.hadoop.mapred.Child: Error running child&lt;BR /&gt;
java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;&lt;BR /&gt;
        at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at javax.security.auth.Subject.doAs(Subject.java:416)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child.main(Child.java:262)&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,649 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;How can I get path and filename of the failing write?&lt;/P&gt;</description>
    <pubDate>Mon, 28 Sep 2020 15:19:43 GMT</pubDate>
    <dc:creator>thomas_herzig</dc:creator>
    <dc:date>2020-09-28T15:19:43Z</dc:date>
    <item>
      <title>Hunk and Cloudera CDH4.4 (hdfs and mapred V1) 2 Problems?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147949#M72214</link>
      <description>&lt;P&gt;We are running a CDH4.4 Hadoop production cluster with only Mapred V1 and Stream functionality. OS ist Ubuntu 12.04.3 LTS with LTS-Raring Kernel (3.8.0.33.33).&lt;/P&gt;

&lt;P&gt;hadoop-hdfs ist owned by user "hdfs"&lt;BR /&gt;&lt;BR /&gt;
hadoop-mapred.0.20 by user "mapred"&lt;BR /&gt;&lt;BR /&gt;
hive is owned by "hive"&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Problem 1:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;
I try to setup Hunk (splunk 6.0-184175) on a separate server with connectivity to the Hadoop-Cluster (HDFS, Hive and Mapred V1):&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;[provider:tdHunk]&lt;BR /&gt;&lt;BR /&gt;
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-h2.0.jar&lt;BR /&gt;&lt;BR /&gt;
vix.env.HADOOP_HOME = /usr/lib/hadoop&lt;BR /&gt;&lt;BR /&gt;
vix.env.JAVA_HOME = /usr/lib/jvm/default-java&lt;BR /&gt;&lt;BR /&gt;
vix.family = hadoop&lt;BR /&gt;&lt;BR /&gt;
vix.fs.default.name = hdfs://&lt;OUR datanode=""&gt;:9000/&lt;BR /&gt;&lt;BR /&gt;
vix.mapred.job.tracker = &lt;OUR jobtracker=""&gt;:9001&lt;BR /&gt;&lt;BR /&gt;
vix.splunk.home.hdfs = /user/splunk/&lt;HUNK-SERVER&gt;&lt;BR /&gt;&lt;BR /&gt;
vix.env.MAPREDUCE_USER = mapred&lt;/HUNK-SERVER&gt;&lt;/OUR&gt;&lt;/OUR&gt;&lt;/P&gt;

&lt;P&gt;[tdindex]&lt;BR /&gt;&lt;BR /&gt;
vix.input.1.accept = .gz$&lt;BR /&gt;&lt;BR /&gt;
vix.input.1.path = &lt;PATH to="" our="" data="https://community.splunk.com/" in="" hdfs=""&gt;/...&lt;BR /&gt;&lt;BR /&gt;
vix.provider = tdHunk&lt;/PATH&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;When I try to use "index=tdindex" I get "&lt;STRONG&gt;No results found.&lt;/STRONG&gt;"&lt;/P&gt;

&lt;P&gt;After deleting the config-option "&lt;EM&gt;vix.env.MAPREDUCE_USER&lt;/EM&gt;" I am able to browse my hdfs data!&lt;BR /&gt;
I configured the splunkd to run as "root", user "splunk" or user "mapred" with the same results.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Problem2:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;
With splunkd running as user "mapred" and "vix.env.MAPREDUCE_USER" not set I am able to get any further.&lt;/P&gt;

&lt;P&gt;I configure a field separation for that data (gzipped csv data with delimiter ';' and no headers).&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;[preprocess-gzip]&lt;BR /&gt;&lt;BR /&gt;
EXTRACT-Datum-Status-Domain = ^(?P&lt;DATUM&gt;\d+-\d+-\d+)[^)\n]*);(?P&lt;STATUS&gt;[^;]+);(?P&lt;DOMAIN&gt;[^;]+)&lt;/DOMAIN&gt;&lt;/STATUS&gt;&lt;/DATUM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;When I search for one of these fields, I get the  following Mapred-Errors:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;[tdHunk] IOException - Error while waiting for MapReduce job to complete, job_id=[!http://&lt;STRONG&gt;Jobtracker-Host&lt;/STRONG&gt;:50030/jobdetails.jsp?jobid=job_201311181028_0054 job_201311181028_0054], state=FAILED, reason=NA&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;And the tasktracker logs look like:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;...&lt;BR /&gt;&lt;BR /&gt;
2013-11-19 11:33:15,832 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://&lt;STRONG&gt;&lt;EM&gt;Namenode&lt;/EM&gt;&lt;/STRONG&gt;:9000/&lt;EM&gt;Path to data&lt;/EM&gt;/&lt;EM&gt;Datafile&lt;/EM&gt;.gz, regex=.avro$.&lt;BR /&gt;&lt;BR /&gt;
2013-11-19 11:33:15,841 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1&lt;BR /&gt;&lt;BR /&gt;
2013-11-19 11:33:15,844 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
2013-11-19 11:33:15,845 WARN org.apache.hadoop.mapred.Child: Error running child&lt;BR /&gt;
java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;&lt;BR /&gt;
        at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at javax.security.auth.Subject.doAs(Subject.java:416)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child.main(Child.java:262)&lt;BR /&gt;&lt;BR /&gt;
2013-11-19 11:33:15,939 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I don't have any .avro file!&lt;/P&gt;

&lt;P&gt;I tried to remove "vix.splunk.search.recordreader" and "vix.splunk.search.recordreader.avro.regex" but it didn't work (entry in $SPLUNK/etc/system/default/indexes.conf seems to override my disable).&lt;/P&gt;

&lt;P&gt;On the Splunk-Host I can use the "hdfs" with all users, and can use "hive" and start mapred scripts as user "mapred" without any problems.&lt;/P&gt;

&lt;P&gt;Any help would be appreciate.&lt;/P&gt;

&lt;P&gt;Thanks, Thomas&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 15:18:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147949#M72214</guid>
      <dc:creator>thomas_herzig</dc:creator>
      <dc:date>2020-09-28T15:18:48Z</dc:date>
    </item>
    <item>
      <title>Re: Hunk and Cloudera CDH4.4 (hdfs and mapred V1) 2 Problems?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147950#M72215</link>
      <description>&lt;P&gt;Thomas,&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Problem 1&lt;/STRONG&gt; &lt;/P&gt;

&lt;P&gt;It seems like this is not an issue anymore but let me explain, "vix.env.MAPREDUCE_USER" is only required if the user that Splunk is running as does &lt;EM&gt;not&lt;/EM&gt; have permissions to interact with HDFS and submit MR jobs. When this field is specified the user &lt;EM&gt;must&lt;/EM&gt; exist in the server running Splunk and the user that runs Splunk must have the ability to sudo as that user &lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Problem 2&lt;/STRONG&gt; &lt;/P&gt;

&lt;P&gt;First, I'd recommend that you assign a different sourcetype to the data rather than work with the default preprocess-gzip - you can assign a sourcetype and specify extractions based on source too, e.g.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;props.conf
# ... means recursively assign the sourcetype to the files under this dir
[source::/path/to/some/dir/...]
sourcetype = foobar
EXTRACT-foo = .... 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The .avro message is a WARN, not an error, and it is expected - as in the default config we ship a record reader that can read avro files. &lt;/P&gt;

&lt;P&gt;The root cause of &lt;STRONG&gt;problem 2&lt;/STRONG&gt; is indicated by the trace you provided &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2013-11-19 11:33:15,845 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This trace indicates that the mapred user on the TaskTracker does not have permission to write to the directory where we copy the Splunk package - the path defaults to &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;vix.splunk.home.datanode           = /tmp/splunk/$SPLUNK_SERVER_NAME/
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Can you please check to ensure that /tmp/ is writable (on TaskTrackers) and who owns /tmp/splunk if that dir is present? &lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2013 18:14:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147950#M72215</guid>
      <dc:creator>Ledion_Bitincka</dc:creator>
      <dc:date>2013-11-19T18:14:56Z</dc:date>
    </item>
    <item>
      <title>Re: Hunk and Cloudera CDH4.4 (hdfs and mapred V1) 2 Problems?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147951#M72216</link>
      <description>&lt;P&gt;Ledion, thanks for your fast answer:&lt;/P&gt;

&lt;P&gt;Problem1:&lt;/P&gt;

&lt;P&gt;hdfs Superuser is "hdfs"&lt;BR /&gt;
mapred User ist "mapred" with full access (rwx) to the hdfs-directories!&lt;/P&gt;

&lt;P&gt;I want to run splunkd as user "splunk", but I need to run Mapred-Jobs as user "mapred". But after setting &lt;STRONG&gt;vix.env.MAPREDUCE_USER = mapred&lt;/STRONG&gt; I loose my access to my hdfs-data! Even if splunkd is actually running as "mapred"!&lt;/P&gt;

&lt;P&gt;Problem2:&lt;/P&gt;

&lt;P&gt;I have a problem to decide if a path in the configuration is hdfs- or unix-fs!&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;vix.splunk.home.hdfs = /user/splunk/&lt;HUNK-SERVER&gt;  &lt;/HUNK-SERVER&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;is hdfs:&lt;/P&gt;

&lt;P&gt;In my case it is (hdfs dfs -ls /user/splunk):&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;drwxrwxrwt   - mapred supergroup          0 2013-11-14 13:46 /user/splunk&lt;BR /&gt;&lt;BR /&gt;
drwxrwxrwt   - mapred supergroup          0 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-19 11:32 /user/splunk/&lt;HUNK-SERVER&gt;/bundles&lt;BR /&gt;&lt;BR /&gt;
...&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-19 13:32 /user/splunk/&lt;HUNK-SERVER&gt;/dispatch&lt;BR /&gt;&lt;BR /&gt;
...&lt;BR /&gt;&lt;BR /&gt;
drwxr-xr-x   - mapred supergroup          0 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;/packages&lt;BR /&gt;&lt;BR /&gt;
-rw-r--r--   3 mapred supergroup   77318285 2013-11-15 10:56 /user/splunk/&lt;HUNK-SERVER&gt;/packages/splunk-6.0-184175-Linux-x86_64.tgz&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/HUNK-SERVER&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;But what about &lt;STRONG&gt;vix.splunk.home.datanode&lt;/STRONG&gt;? I set this to "/opt/splunk" and created that dir in unix-fs on all nodes of the cluster and the hunk-node!&lt;/P&gt;

&lt;P&gt;root@&lt;HUNK-NODE&gt;:~# ls -ld /opt/splunk/&lt;BR /&gt;&lt;BR /&gt;
drwxrwxr-x 10 mapred hadoop 4096 Nov 18 14:46 /opt/splunk/&lt;/HUNK-NODE&gt;&lt;/P&gt;

&lt;P&gt;Do I need to create an hdfs-dir "/opt/splunk"? And an unix-fs "/user/splunk".&lt;/P&gt;

&lt;P&gt;Another observation:&lt;/P&gt;

&lt;P&gt;The task is to read more than 30.000.000 lines from 1913 Datafiles. This means that 1913 Map-Jobs will be generated. Our cluster is running 50 concurrent map jobs. The jobtracker shows that when the first 100 jobs have failed, all other jobs are killed and the jobs has failed.&lt;/P&gt;

&lt;P&gt;I tried to increase "vix.mapred.job.reuse.jvm.num.tasks" to 2000, but that didn't change anything ((Maybe this is a parameter I have to change in the Cloudera-Env).&lt;/P&gt;

&lt;P&gt;And here are the error-msgs from the search.log:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;11-20-2013 12:03:54.718 WARN  ERP.tdHunk -  SplunkMR$SplunkBaseMapper - Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://td-db02.intern.trusteddialog.de:9000/&lt;EM&gt;Path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;, regex=.avro$.&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk -  VirtualIndex - File meets the search criteria,. Will consider it, path=hdfs://td-db02.intern.trusteddialog.de:9000/&lt;EM&gt;Path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk -  Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #76&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #77&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #76&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk -  ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #77&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  OutputProcessor - received: hdfs://td-db02.intern.trusteddialog.de:9000/&lt;PATH to="" data="https://community.splunk.com/"&gt;/mtdscan0.freenet.de-tdchecks.2013-11-19T03+05Z0000.gz:0+23669&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk -  DFSInputStream - newInfo = LocatedBlocks{&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    fileLength=1255240&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    underConstruction=false&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    blocks=[LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.102:50010]}]&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    lastLocatedBlock=LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.102:50010]}&lt;BR /&gt;&lt;BR /&gt;
11-20-2013 12:03:54.720 ERROR ERP.tdHunk -    isLastBlockComplete=true}&lt;BR /&gt;&lt;BR /&gt;
...&lt;/PATH&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;After 4 occurances of that sequence another Logging follows:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;11-20-2013 12:03:57.381 DEBUG ERP.tdHunk -  DFSInputStream - Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/XXX.YYY.ZZZ.90,port=50010,localport=43147])&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -  java.io.EOFException: Premature EOF: no length prefix available&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1084)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at java.io.DataInputStream.read(DataInputStream.java:149)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:157)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:141)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at java.io.InputStream.read(InputStream.java:101)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:147)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.input.SplunkLineRecordReader.nextKeyValue(SplunkLineRecordReader.java:40)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:562)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:520)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.OutputProcessor.outputStreaming(OutputProcessor.java:216)&lt;BR /&gt;
11-20-2013 12:03:57.381 ERROR ERP.tdHunk -      at com.splunk.mr.OutputProcessor.run(OutputProcessor.java:167)&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Here are the errors from the dispatch log (hdfs://user/splunk/td-ha01/dispatch/1384949253.250/0/_logs/history/job_201311201109_0009_1384949219885_mapred_SPLK_td-&lt;STRONG&gt;Hunknode&lt;/STRONG&gt;_1384949253.25:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Task TASKID="task_201311201109_0009_m_000041" TASK_TYPE="MAP" START_TIME="1384949222512" SPLITS="/default-rack/&lt;EM&gt;clusternode01&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-&lt;BR /&gt;
rack/&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode02&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" START_TIME="1384949277094" TRACK&lt;BR /&gt;
ER_NAME="tracker_&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;:localhost/127.0.0.1:44134" HTTP_PORT="50060" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" TASK_STATUS="FAILED" FINISH_TIME="1384949280971" HOSTNAME="&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" ERROR="java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;&lt;BR /&gt;
        at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at javax.security.auth.Subject.doAs(Subject.java:416)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child.main(Child.java:262)&lt;BR /&gt;
" .&lt;BR /&gt;&lt;BR /&gt;
Task TASKID="task_201311201109_0009_m_000042" TASK_TYPE="MAP" START_TIME="1384949226178" SPLITS="/default-rack/&lt;EM&gt;clusternode02&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode00&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;,/default-rack/&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" START_TIME="1384949277093" TRACKER_NAME="tracker_&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;:localhost/127.0.0.1:44134" HTTP_PORT="50060" .&lt;BR /&gt;&lt;BR /&gt;
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000"   TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1384949282477" HOSTNAME="&lt;EM&gt;clusternode03&lt;/EM&gt;.&lt;EM&gt;cluster.domain&lt;/EM&gt;"   ERROR="java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
...&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;And from the jobtracker logs I get:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;2013-11-20 11:42:40,074 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,242 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,243 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,885 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:41,956 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6328edf2&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,533 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://&lt;STRONG&gt;Namenode&lt;/STRONG&gt;:9000/&lt;EM&gt;path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;:0+8438254&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,590 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &amp;amp; initialized native-zlib library&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,591 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz]&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,630 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://&lt;STRONG&gt;Namenode&lt;/STRONG&gt;:9000/&lt;EM&gt;path to data&lt;/EM&gt;/&lt;EM&gt;data file.gz&lt;/EM&gt;, regex=.avro$.&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,639 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,641 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,642 WARN org.apache.hadoop.mapred.Child: Error running child&lt;BR /&gt;
java.io.IOException: Permission denied&lt;BR /&gt;&lt;BR /&gt;
        at java.io.UnixFileSystem.createFileExclusively(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at java.io.File.createNewFile(File.java:900)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)&lt;BR /&gt;&lt;BR /&gt;
        at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;&lt;BR /&gt;
        at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;&lt;BR /&gt;
        at javax.security.auth.Subject.doAs(Subject.java:416)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)&lt;BR /&gt;&lt;BR /&gt;
        at org.apache.hadoop.mapred.Child.main(Child.java:262)&lt;BR /&gt;&lt;BR /&gt;
2013-11-20 11:42:42,649 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;How can I get path and filename of the failing write?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 15:19:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147951#M72216</guid>
      <dc:creator>thomas_herzig</dc:creator>
      <dc:date>2020-09-28T15:19:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hunk and Cloudera CDH4.4 (hdfs and mapred V1) 2 Problems?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147952#M72217</link>
      <description>&lt;P&gt;Thomas,&lt;/P&gt;

&lt;P&gt;a) you don't need to run Splunk as a Hadoop super user (hdfs or mapred) in order to access HDFS and/or submit MapReduce jobs. So, as long as the Splunk user has :&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;read permission to the data files&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;write permission to the vix.splunk.home.hdfs  (HDFS path) &lt;/LI&gt;
&lt;LI&gt;allowed to submit MapReduce jobs&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;you'll be fine&lt;/P&gt;

&lt;P&gt;b) vix.splunk.home.datanode - is a path in the TaskTracker/DataNode local fs, this path does &lt;STRONG&gt;not&lt;/STRONG&gt; need to exist in HDFS. The local fs path needs to be writable to the &lt;STRONG&gt;mapred&lt;/STRONG&gt; local user. We default this path to &lt;STRONG&gt;/tmp/splunk/$SPLUNK_SERVER_NAME/&lt;/STRONG&gt; because generally /tmp/ is writable by everyone. &lt;/P&gt;

&lt;P&gt;btw feel free to reach out to me directly over email so we can resolve your issues faster ledion at splunk dot com &lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2013 18:41:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Hunk-and-Cloudera-CDH4-4-hdfs-and-mapred-V1-2-Problems/m-p/147952#M72217</guid>
      <dc:creator>Ledion_Bitincka</dc:creator>
      <dc:date>2013-11-20T18:41:33Z</dc:date>
    </item>
  </channel>
</rss>

