Thomas,
Problem 1
It seems like this is not an issue anymore but let me explain, "vix.env.MAPREDUCE_USER" is only required if the user that Splunk is running as does not have permissions to interact with HDFS and submit MR jobs. When this field is specified the user must exist in the server running Splunk and the user that runs Splunk must have the ability to sudo as that user
Problem 2
First, I'd recommend that you assign a different sourcetype to the data rather than work with the default preprocess-gzip - you can assign a sourcetype and specify extractions based on source too, e.g.
props.conf
# ... means recursively assign the sourcetype to the files under this dir
[source::/path/to/some/dir/...]
sourcetype = foobar
EXTRACT-foo = ....
The .avro message is a WARN, not an error, and it is expected - as in the default config we ship a record reader that can read avro files.
The root cause of problem 2 is indicated by the trace you provided
2013-11-19 11:33:15,845 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at com.splunk.mr.SetupCommandHandler.setupSplunk(SetupCommandHandler.java:167)
at com.splunk.mr.SplunkMR$SplunkSearchMapper.ensureSplunkdEnv(SplunkMR.java:599
This trace indicates that the mapred user on the TaskTracker does not have permission to write to the directory where we copy the Splunk package - the path defaults to
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
Can you please check to ensure that /tmp/ is writable (on TaskTrackers) and who owns /tmp/splunk if that dir is present?
... View more