Running Hunk v6.2 that connects to Hortonworks Hadoop Cluster v2.2.4.2. Map tasks are failing. Broken pipe errors. Any idea how to troubleshoot?
2015-06-01 10:55:18,355 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2015-06-01 10:55:18,464 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-06-01 10:55:18,464 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-06-01 10:55:18,479 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-06-01 10:55:18,479 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1433169589737_0005, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7c9209e8)
2015-06-01 10:55:18,600 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-06-01 10:55:18,925 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /grid/hadoop/yarn/usercache/root/appcache/application_1433169589737_0005,/grid1/hadoop/yarn/usercache/root/appcache/application_1433169589737_0005
2015-06-01 10:55:19,477 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-06-01 10:55:20,046 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-06-01 10:55:20,300 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: /myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt:0+134217728
2015-06-01 10:55:20,573 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=null, tz.id="America/Toronto", name="Eastern Standard Time" for regex=/myhadoopcluster/syslogs/\d+/(\d+)-(\d+)-(\d+)\w+.txt, format=yyyyMMdd
2015-06-01 10:55:20,573 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=null, tz.id="America/Toronto", name="Eastern Standard Time" for regex=/myhadoopcluster/syslogs/\d+/(\d+)-(\d+)-(\d+)\w+.txt, format=yyyyMMdd
2015-06-01 10:55:20,600 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SplunkJournalRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SplunkJournalRecordReader, path=hdfs://myhost-nn1.internal:8020/myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt, regex=/journal.gz$.
2015-06-01 10:55:20,615 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://myhost-nn1.internal:8020/myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt, regex=.avro$.
2015-06-01 10:55:20,626 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SimpleCSVRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SimpleCSVRecordReader, path=hdfs://myhost-nn1.internal:8020/myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt, regex=.([tc]sv)(?:.(?:gz|bz2|snappy))?$.
2015-06-01 10:55:20,635 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SequenceFileRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SequenceFileRecordReader, path=hdfs://myhost-nn1.internal:8020/myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt, regex=.seq$.
2015-06-01 10:55:20,635 INFO [main] com.splunk.mr.JobSubmitterInputFormat: using class=com.splunk.mr.input.SplunkLineRecordReader to process split=/myhadoopcluster/syslogs/2015/2015-06-01_datacollector2.txt:0+134217728
2015-06-01 10:55:20,705 INFO [main] com.splunk.mr.SplunkSearchMapper: CONF_DN_HOME is set to /tmp/splunk/splunk-search1/
2015-06-01 10:55:20,710 INFO [main] com.splunk.mr.SplunkSearchMapper: Ensuring Hunk is setup correctly took elapsed_ms=5
2015-06-01 10:55:20,770 INFO [main] com.splunk.mr.SplunkSearchMapper: _splunkProcess.getStdin()=java.lang.UNIXProcess$ProcessPipeOutputStream@66f8277b
2015-06-01 10:55:20,778 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
2015-06-01 10:55:21,089 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at com.splunk.io.FastByteArrayOutputStream.writeTo(FastByteArrayOutputStream.java:125)
at com.splunk.io.ChunkedOutputStream.write(ChunkedOutputStream.java:91)
at com.splunk.io.SearchOutputStream.write(SearchOutputStream.java:89)
at com.splunk.mr.SplunkBaseMapper.flush(SplunkBaseMapper.java:365)
at com.splunk.mr.SplunkBaseMapper.doStream(SplunkBaseMapper.java:423)
at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:375)
at com.splunk.mr.SplunkBaseMapper.runImpl(SplunkBaseMapper.java:301)
at com.splunk.mr.SplunkSearchMapper.runImpl(SplunkSearchMapper.java:314)
at com.splunk.mr.SplunkBaseMapper.run(SplunkBaseMapper.java:169)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-06-01 10:55:21,100 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
It looks like any scheduled searches with "index=[asterisk]" will also search the hadoop virtual index. The problem is quite a few splunk apps do this. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils
These mapreduce jobs will fail and affect mapreduce scheduling/performance. The workaround is to change the scheduled searches from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".
It looks like any scheduled searches with "index=[asterisk]" will also search the hadoop virtual index. The problem is quite a few splunk apps do this. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils
These mapreduce jobs will fail and affect mapreduce scheduling/performance. The workaround is to change the scheduled searches from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".
That is definitely a inefficiency of the app that primarily comes from the need to keep the app setup simple - otherwise the app would need to ask the admin/installer to specify the index. I will bring this up with these app owners.
That error message usually means that the mapper task was not able to run the search process (or the process crashed) on the Hadoop nodes. Can you check on the Hadoop nodes if a splunk package is properly installed in " /tmp/splunk/splunk-search1/"? (ie does /tmp/splunk/splunk-search1/splunk/bin/splunkd exitst?) Another possible reason for this would be that /tmp is sometimes mounted without execute permissions, in which case you should change the dir in the provider by setting vix.splunk.home.datanode to a dir with execute permissions and a few GB of space.
Ah, hunk didn't have permissions to write to the /tmp directory. I've manually created the /tmp/splunk/splunk-search1/bin directories and ensured that hunk is the owner. Problem is the map tasks are still failing, same error. There is still no splunkd package installed. How would I install it?
I'd recommend you simply set vix.splunk.home.datanode to a directory that hunk would have access to on the Hadoop nodes.
Sorry, all this time I assumed these directories were in HDFS. I just found on another thread that the vix.splunk.home.datanode path actually refers to the local filesystem on the tasktracker node. I can now confirm that /tmp/splunk/splunk-search1/hunk/bin/splunkd did exist. I also changed vix.splunk.home.datanode to /user/splunk/splunk-search1 and it created /user/splunk/splunk-search1/hunk/bin/splunkd. I'm still getting the broken pipe errors though.
Ok, so I'm seeing this log entry in vix.splunk.home.hdfs/dispatch/scheduler_admin_c3BsdW5rX2FwcF9taWNyb3NvZnRfZXhjaGFuZ2U_RMD5f2faa9386d1f44b5_at_1433340300_2932/0/dispatch_dirs/SplunkMR_attempt_1433169589737_2064_m_000001_3/search.log
06-03-2015 10:10:03.000 ERROR dispatchRunner - RunDispatch::runDispatchThread threw error: Application does not exist: splunk_app_microsoft_exchange
We are also licensed for the Splunk App for Microsoft Exchange. I'm seeing all sorts of searches for "index=" in $SPLUNK_HOME/etc/apps/splunk_app_microsoft_exchange/default/savedsearches.conf. Does a scheduled search for "index=" also search virtual indexes?
The "Application does not exist: splunk_app_microsoft_exchange" error message should be a temporary error message until the configuration bundles stabilize in Hunk. Did this error go away within a few minutes or does it persist?
I meant "index=[star]". The page stripped out the asterisk when I hit submit.