Ledion, thanks for your fast answer:
Problem1:
hdfs Superuser is "hdfs"
mapred User ist "mapred" with full access (rwx) to the hdfs-directories!
I want to run splunkd as user "splunk", but I need to run Mapred-Jobs as user "mapred". But after setting vix.env.MAPREDUCE_USER = mapred I loose my access to my hdfs-data! Even if splunkd is actually running as "mapred"!
Problem2:
I have a problem to decide if a path in the configuration is hdfs- or unix-fs!
vix.splunk.home.hdfs = /user/splunk/
is hdfs:
In my case it is (hdfs dfs -ls /user/splunk):
drwxrwxrwt - mapred supergroup 0 2013-11-14 13:46 /user/splunk
drwxrwxrwt - mapred supergroup 0 2013-11-15 10:56 /user/splunk/
drwxr-xr-x - mapred supergroup 0 2013-11-19 11:32 /user/splunk/ /bundles
...
drwxr-xr-x - mapred supergroup 0 2013-11-19 13:32 /user/splunk/ /dispatch
...
drwxr-xr-x - mapred supergroup 0 2013-11-15 10:56 /user/splunk/ /packages
-rw-r--r-- 3 mapred supergroup 77318285 2013-11-15 10:56 /user/splunk/ /packages/splunk-6.0-184175-Linux-x86_64.tgz
But what about vix.splunk.home.datanode? I set this to "/opt/splunk" and created that dir in unix-fs on all nodes of the cluster and the hunk-node!
root@ :~# ls -ld /opt/splunk/
drwxrwxr-x 10 mapred hadoop 4096 Nov 18 14:46 /opt/splunk/
Do I need to create an hdfs-dir "/opt/splunk"? And an unix-fs "/user/splunk".
Another observation:
The task is to read more than 30.000.000 lines from 1913 Datafiles. This means that 1913 Map-Jobs will be generated. Our cluster is running 50 concurrent map jobs. The jobtracker shows that when the first 100 jobs have failed, all other jobs are killed and the jobs has failed.
I tried to increase "vix.mapred.job.reuse.jvm.num.tasks" to 2000, but that didn't change anything ((Maybe this is a parameter I have to change in the Cloudera-Env).
And here are the error-msgs from the search.log:
11-20-2013 12:03:54.718 WARN ERP.tdHunk - SplunkMR$SplunkBaseMapper - Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://td-db02.intern.trusteddialog.de:9000/Path to data/data file.gz, regex=.avro$.
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk - VirtualIndex - File meets the search criteria,. Will consider it, path=hdfs://td-db02.intern.trusteddialog.de:9000/Path to data/data file.gz
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk - Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #76
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #77
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #76
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #77
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - OutputProcessor - received: hdfs://td-db02.intern.trusteddialog.de:9000/ /mtdscan0.freenet.de-tdchecks.2013-11-19T03+05Z0000.gz:0+23669
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - DFSInputStream - newInfo = LocatedBlocks{
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - fileLength=1255240
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - underConstruction=false
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - blocks=[LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.102:50010]}]
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - lastLocatedBlock=LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.102:50010]}
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - isLastBlockComplete=true}
...
After 4 occurances of that sequence another Logging follows:
11-20-2013 12:03:57.381 DEBUG ERP.tdHunk - DFSInputStream - Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/XXX.YYY.ZZZ.90,port=50010,localport=43147])
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - java.io.EOFException: Premature EOF: no length prefix available
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1084)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at java.io.DataInputStream.read(DataInputStream.java:149)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:157)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:141)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at java.io.InputStream.read(InputStream.java:101)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:147)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.input.SplunkLineRecordReader.nextKeyValue(SplunkLineRecordReader.java:40)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:562)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:520)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.OutputProcessor.outputStreaming(OutputProcessor.java:216)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.OutputProcessor.run(OutputProcessor.java:167)
Here are the errors from the dispatch log (hdfs://user/splunk/td-ha01/dispatch/1384949253.250/0/_logs/history/job_201311201109_0009_1384949219885_mapred_SPLK_td-Hunknode_1384949253.25:
Task TASKID="task_201311201109_0009_m_000041" TASK_TYPE="MAP" START_TIME="1384949222512" SPLITS="/default-rack/clusternode01.cluster.domain,/default-
rack/clusternode03.cluster.domain,/default-rack/clusternode02.cluster.domain" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" START_TIME="1384949277094" TRACK
ER_NAME="tracker_clusternode03.cluster.domain:localhost/127.0.0.1:44134" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" TASK_STATUS="FAILED" FINISH_TIME="1384949280971" HOSTNAME="clusternode03.cluster.domain" ERROR="java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
" .
Task TASKID="task_201311201109_0009_m_000042" TASK_TYPE="MAP" START_TIME="1384949226178" SPLITS="/default-rack/clusternode02.cluster.domain,/default-rack/clusternode00.cluster.domain,/default-rack/clusternode03.cluster.domain" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" START_TIME="1384949277093" TRACKER_NAME="tracker_clusternode03.cluster.domain:localhost/127.0.0.1:44134" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1384949282477" HOSTNAME="clusternode03.cluster.domain" ERROR="java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
...
And from the jobtracker logs I get:
2013-11-20 11:42:40,074 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2013-11-20 11:42:41,242 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2013-11-20 11:42:41,243 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-11-20 11:42:41,885 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-11-20 11:42:41,956 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6328edf2
2013-11-20 11:42:42,533 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://Namenode:9000/path to data/data file.gz:0+8438254
2013-11-20 11:42:42,590 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2013-11-20 11:42:42,591 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz]
2013-11-20 11:42:42,630 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://Namenode:9000/path to data/data file.gz, regex=.avro$.
2013-11-20 11:42:42,639 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-11-20 11:42:42,641 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied
2013-11-20 11:42:42,642 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2013-11-20 11:42:42,649 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
How can I get path and filename of the failing write?
... View more