We are running a CDH4.4 Hadoop production cluster with only Mapred V1 and Stream functionality. OS ist Ubuntu 12.04.3 LTS with LTS-Raring Kernel (3.8.0.33.33).
hadoop-hdfs ist owned by user "hdfs"
hadoop-mapred.0.20 by user "mapred"
hive is owned by "hive"
Problem 1:
I try to setup Hunk (splunk 6.0-184175) on a separate server with connectivity to the Hadoop-Cluster (HDFS, Hive and Mapred V1):
[provider:tdHunk]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-h2.0.jar
vix.env.HADOOP_HOME = /usr/lib/hadoop
vix.env.JAVA_HOME = /usr/lib/jvm/default-java
vix.family = hadoop
vix.fs.default.name = hdfs://:9000/
vix.mapred.job.tracker =:9001
vix.splunk.home.hdfs = /user/splunk/
vix.env.MAPREDUCE_USER = mapred[tdindex]
vix.input.1.accept = .gz$
vix.input.1.path =/...
vix.provider = tdHunk
When I try to use "index=tdindex" I get "No results found."
After deleting the config-option "vix.env.MAPREDUCE_USER" I am able to browse my hdfs data!
I configured the splunkd to run as "root", user "splunk" or user "mapred" with the same results.
Problem2:
With splunkd running as user "mapred" and "vix.env.MAPREDUCE_USER" not set I am able to get any further.
I configure a field separation for that data (gzipped csv data with delimiter ';' and no headers).
[preprocess-gzip]
EXTRACT-Datum-Status-Domain = ^(?P\d+-\d+-\d+)[^)\n]*);(?P [^;]+);(?P [^;]+)
When I search for one of these fields, I get the following Mapred-Errors:
[tdHunk] IOException - Error while waiting for MapReduce job to complete, job_id=[!http://Jobtracker-Host:50030/jobdetails.jsp?jobid=job_201311181028_0054 job_201311181028_0054], state=FAILED, reason=NA
And the tasktracker logs look like:
...
2013-11-19 11:33:15,832 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://Namenode:9000/Path to data/Datafile.gz, regex=.avro$.
2013-11-19 11:33:15,841 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-11-19 11:33:15,844 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied
2013-11-19 11:33:15,845 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2013-11-19 11:33:15,939 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
I don't have any .avro file!
I tried to remove "vix.splunk.search.recordreader" and "vix.splunk.search.recordreader.avro.regex" but it didn't work (entry in $SPLUNK/etc/system/default/indexes.conf seems to override my disable).
On the Splunk-Host I can use the "hdfs" with all users, and can use "hive" and start mapred scripts as user "mapred" without any problems.
Any help would be appreciate.
Thanks, Thomas
Thomas,
a) you don't need to run Splunk as a Hadoop super user (hdfs or mapred) in order to access HDFS and/or submit MapReduce jobs. So, as long as the Splunk user has :
you'll be fine
b) vix.splunk.home.datanode - is a path in the TaskTracker/DataNode local fs, this path does not need to exist in HDFS. The local fs path needs to be writable to the mapred local user. We default this path to /tmp/splunk/$SPLUNK_SERVER_NAME/ because generally /tmp/ is writable by everyone.
btw feel free to reach out to me directly over email so we can resolve your issues faster ledion at splunk dot com
Ledion, thanks for your fast answer:
Problem1:
hdfs Superuser is "hdfs"
mapred User ist "mapred" with full access (rwx) to the hdfs-directories!
I want to run splunkd as user "splunk", but I need to run Mapred-Jobs as user "mapred". But after setting vix.env.MAPREDUCE_USER = mapred I loose my access to my hdfs-data! Even if splunkd is actually running as "mapred"!
Problem2:
I have a problem to decide if a path in the configuration is hdfs- or unix-fs!
vix.splunk.home.hdfs = /user/splunk/
is hdfs:
In my case it is (hdfs dfs -ls /user/splunk):
drwxrwxrwt - mapred supergroup 0 2013-11-14 13:46 /user/splunk
drwxrwxrwt - mapred supergroup 0 2013-11-15 10:56 /user/splunk/
drwxr-xr-x - mapred supergroup 0 2013-11-19 11:32 /user/splunk//bundles
...
drwxr-xr-x - mapred supergroup 0 2013-11-19 13:32 /user/splunk//dispatch
...
drwxr-xr-x - mapred supergroup 0 2013-11-15 10:56 /user/splunk//packages
-rw-r--r-- 3 mapred supergroup 77318285 2013-11-15 10:56 /user/splunk//packages/splunk-6.0-184175-Linux-x86_64.tgz
But what about vix.splunk.home.datanode? I set this to "/opt/splunk" and created that dir in unix-fs on all nodes of the cluster and the hunk-node!
root@
drwxrwxr-x 10 mapred hadoop 4096 Nov 18 14:46 /opt/splunk/
Do I need to create an hdfs-dir "/opt/splunk"? And an unix-fs "/user/splunk".
Another observation:
The task is to read more than 30.000.000 lines from 1913 Datafiles. This means that 1913 Map-Jobs will be generated. Our cluster is running 50 concurrent map jobs. The jobtracker shows that when the first 100 jobs have failed, all other jobs are killed and the jobs has failed.
I tried to increase "vix.mapred.job.reuse.jvm.num.tasks" to 2000, but that didn't change anything ((Maybe this is a parameter I have to change in the Cloudera-Env).
And here are the error-msgs from the search.log:
11-20-2013 12:03:54.718 WARN ERP.tdHunk - SplunkMR$SplunkBaseMapper - Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://td-db02.intern.trusteddialog.de:9000/Path to data/data file.gz, regex=.avro$.
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk - VirtualIndex - File meets the search criteria,. Will consider it, path=hdfs://td-db02.intern.trusteddialog.de:9000/Path to data/data file.gz
11-20-2013 12:03:54.718 DEBUG ERP.tdHunk - Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #76
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - Client$Connection$3 - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred sending #77
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #76
11-20-2013 12:03:54.719 DEBUG ERP.tdHunk - ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - Client$Connection - IPC Client (1703900038) connection to td-db02.intern.trusteddialog.de/XXX.YYY.ZZZ.102:9000 from mapred got value #77
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - ProtobufRpcEngine$Invoker - Call: getBlockLocations took 1ms
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - OutputProcessor - received: hdfs://td-db02.intern.trusteddialog.de:9000//mtdscan0.freenet.de-tdchecks.2013-11-19T03+05Z0000.gz:0+23669
11-20-2013 12:03:54.720 DEBUG ERP.tdHunk - DFSInputStream - newInfo = LocatedBlocks{
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - fileLength=1255240
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - underConstruction=false
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - blocks=[LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.102:50010]}]
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - lastLocatedBlock=LocatedBlock{BP-1883466371-XXX.YYY.ZZZ.102-1362479174669:blk_5425927572410223942_4015583; getBlockSize()=1255240; corrupt=false; offset=0; locs=[XXX.YYY.ZZZ.90:50010, XXX.YYY.ZZZ.100:50010, XXX.YYY.ZZZ.102:50010]}
11-20-2013 12:03:54.720 ERROR ERP.tdHunk - isLastBlockComplete=true}
...
After 4 occurances of that sequence another Logging follows:
11-20-2013 12:03:57.381 DEBUG ERP.tdHunk - DFSInputStream - Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/XXX.YYY.ZZZ.90,port=50010,localport=43147])
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - java.io.EOFException: Premature EOF: no length prefix available
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1084)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at java.io.DataInputStream.read(DataInputStream.java:149)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:157)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:141)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at java.io.InputStream.read(InputStream.java:101)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:147)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.input.SplunkLineRecordReader.nextKeyValue(SplunkLineRecordReader.java:40)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:562)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:520)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.OutputProcessor.outputStreaming(OutputProcessor.java:216)
11-20-2013 12:03:57.381 ERROR ERP.tdHunk - at com.splunk.mr.OutputProcessor.run(OutputProcessor.java:167)
Here are the errors from the dispatch log (hdfs://user/splunk/td-ha01/dispatch/1384949253.250/0/_logs/history/job_201311201109_0009_1384949219885_mapred_SPLK_td-Hunknode_1384949253.25:
Task TASKID="task_201311201109_0009_m_000041" TASK_TYPE="MAP" START_TIME="1384949222512" SPLITS="/default-rack/clusternode01.cluster.domain,/default-
rack/clusternode03.cluster.domain,/default-rack/clusternode02.cluster.domain" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" START_TIME="1384949277094" TRACK
ER_NAME="tracker_clusternode03.cluster.domain:localhost/127.0.0.1:44134" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000001" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000001_0" TASK_STATUS="FAILED" FINISH_TIME="1384949280971" HOSTNAME="clusternode03.cluster.domain" ERROR="java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
" .
Task TASKID="task_201311201109_0009_m_000042" TASK_TYPE="MAP" START_TIME="1384949226178" SPLITS="/default-rack/clusternode02.cluster.domain,/default-rack/clusternode00.cluster.domain,/default-rack/clusternode03.cluster.domain" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" START_TIME="1384949277093" TRACKER_NAME="tracker_clusternode03.cluster.domain:localhost/127.0.0.1:44134" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201311201109_0009_m_000000" TASK_ATTEMPT_ID="attempt_201311201109_0009_m_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1384949282477" HOSTNAME="clusternode03.cluster.domain" ERROR="java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
...
And from the jobtracker logs I get:
2013-11-20 11:42:40,074 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2013-11-20 11:42:41,242 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2013-11-20 11:42:41,243 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-11-20 11:42:41,885 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-11-20 11:42:41,956 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6328edf2
2013-11-20 11:42:42,533 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://Namenode:9000/path to data/data file.gz:0+8438254
2013-11-20 11:42:42,590 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2013-11-20 11:42:42,591 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz]
2013-11-20 11:42:42,630 WARN com.splunk.mr.SplunkMR$SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://Namenode:9000/path to data/data file.gz, regex=.avro$.
2013-11-20 11:42:42,639 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-11-20 11:42:42,641 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: Permission denied
2013-11-20 11:42:42,642 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.setup(SplunkMR.java:624)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.run(SplunkMR.java:394)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2013-11-20 11:42:42,649 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
How can I get path and filename of the failing write?
Thomas,
Problem 1
It seems like this is not an issue anymore but let me explain, "vix.env.MAPREDUCE_USER" is only required if the user that Splunk is running as does not have permissions to interact with HDFS and submit MR jobs. When this field is specified the user must exist in the server running Splunk and the user that runs Splunk must have the ability to sudo as that user
Problem 2
First, I'd recommend that you assign a different sourcetype to the data rather than work with the default preprocess-gzip - you can assign a sourcetype and specify extractions based on source too, e.g.
props.conf
# ... means recursively assign the sourcetype to the files under this dir
[source::/path/to/some/dir/...]
sourcetype = foobar
EXTRACT-foo = ....
The .avro message is a WARN, not an error, and it is expected - as in the default config we ship a record reader that can read avro files.
The root cause of problem 2 is indicated by the trace you provided
2013-11-19 11:33:15,845 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599
This trace indicates that the mapred user on the TaskTracker does not have permission to write to the directory where we copy the Splunk package - the path defaults to
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
Can you please check to ensure that /tmp/ is writable (on TaskTrackers) and who owns /tmp/splunk if that dir is present?