Running a query in Hunk against the firewall logs stored over the last 60 minutes and it appears to complete, but I see this:
• [cdhprovider] Error while running external process, return_code=255. See search.log for more info
• [cdhprovider] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: Error while waiting for MapReduce job to complete, job_id=[!http://<hadoop-name-node>:8088 job_1441286005166_0151], state=FAILED, reason=Task failed task_1441286005166_0151_m_000002
Looking at the search.log file, I see this:
09-17-2015 13:45:17.597 INFO ERP.cdhprovider - JobSubmitter - JobId=job_1441286005166_0151, Map: 100.00%, Reduce: 0.00%
09-17-2015 13:45:19.501 INFO ERP.cdhprovider - JobSubmitter - job.nameSPLK_<hunk-server>_1442511876.85_0, Successful tasks: 17
09-17-2015 13:45:19.501 INFO ERP.cdhprovider - JobSubmitter - job.nameSPLK_<hunk-server>_1442511876.85_0, Failed or killed tasks: 9983
09-17-2015 13:45:20.071 INFO ERP.cdhprovider - JobSubmitter - time spent writing map errors, elapsed_ms=570
09-17-2015 13:45:20.083 INFO ERP.cdhprovider - AsyncMRJob - start killing MR job id=job_1441286005166_0151, job.name=SPLK_<hunk-server>_1442511876.85_0, _state=SUBMITTED
09-17-2015 13:45:20.092 INFO ERP.cdhprovider - YarnClientImpl - Killed application application_1441286005166_0151
09-17-2015 13:45:20.092 INFO ERP.cdhprovider - AsyncMRJob - finished killing MR job id=job_1441286005166_0151, job.name=SPLK_<hunk-server>_1442511876.85_0
09-17-2015 13:45:20.199 INFO ERP.cdhprovider - FileBasedHeartBeat - stopping hearbeat thread, path=/user/hunk/dispatch/1442511876.85, interval=1000, _startIndex=0, _index=32
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - SplunkMR - Failed to start MapReduce job. Please consult search.log for more information. Message: Error while waiting for MapReduce job to complete, job_id=[!http://<hadoop-name-node>:8088 job_1441286005166_0151], state=FAILED, reason=Task failed task_1441286005166_0151_m_000002
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - Job failed as tasks failed. failedMaps:1 failedReduces:0
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: Error while waiting for MapReduce job to complete, job_id=[!http://<hadoop-name-node>:8088 job_1441286005166_0151], state=FAILED, reason=Task failed task_1441286005166_0151_m_000002
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - Job failed as tasks failed. failedMaps:1 failedReduces:0
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:679)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.JobSubmitter.startJobForAvailableSplits(JobSubmitter.java:732)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.JobSubmitter.finished(JobSubmitter.java:722)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1198)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:929)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:772)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1519)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1301)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1556)
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - Caused by: java.io.IOException: Error while waiting for MapReduce job to complete, job_id=[!http://<hadoop-name-node>:8088 job_1441286005166_0151], state=FAILED, reason=Task failed task_1441286005166_0151_m_000002
09-17-2015 13:45:20.201 ERROR ERP.cdhprovider - Job failed as tasks failed. failedMaps:1 failedReduces:0
I'm confused as if I don't know if the search completed successfully or not
Thx
Hi There,
I'm also experiencing the same issue. I'd like to not create a new question since I'd like to have an answer for this problem.
A couple of observations:
Checking the search and task tracker logs, it seems like the map job completes (hits 100%), and doesn't return properly afterwards. I'm not sure if there is a reduce portion of the MR job, but if there is, it never starts.
I'm running a kerberos secured hortonworks hadoop cluster - HDP 2.2.4.2-2 and running jobs via Yarn 2.x
Here are my logs below:
Interface Errors:
[provider] Error while running external process, return_code=255. See search.log for more info
[provider] IOException - Error while waiting for MapReduce job to complete, job_id=[!http://:8088/proxy/application_1440205050298_7902/ job_1440205050298_7902], state=FAILED, reason=Task failed task_1440205050298_7902_m_000003
search.log:
09-23-2015 18:47:09.007 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:09.540 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:09.843 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:10.505 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:10.786 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:11.463 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:11.759 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:12.449 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:12.753 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:13.474 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:13.791 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:14.517 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:14.835 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:15.741 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:16.146 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 11.97%, Reduce: 0.00%
09-23-2015 18:47:16.150 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:17.031 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:18.104 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:18.402 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 11.97%, Reduce: 0.00%
09-23-2015 18:47:18.402 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:19.368 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:20.080 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 18.38%, Reduce: 0.00%
09-23-2015 18:47:22.089 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 25.27%, Reduce: 0.00%
09-23-2015 18:47:24.100 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 25.27%, Reduce: 0.00%
09-23-2015 18:47:26.112 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 16.70%, Reduce: 0.00%
09-23-2015 18:47:28.125 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:29.808 INFO DispatchThread - Generating results preview took 1 ms
09-23-2015 18:47:30.137 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 0.00%, Reduce: 0.00%
09-23-2015 18:47:32.146 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 100.00%, Reduce: 0.00%
09-23-2015 18:47:34.158 INFO ERP.provider - JobSubmitter - JobId=job_1440205050298_7902, Map: 100.00%, Reduce: 0.00%
09-23-2015 18:47:34.359 INFO ERP.provider - JobSubmitter - job.nameSPLK_<hunk-server>_1443059205.319_0, Successful tasks: 0
09-23-2015 18:47:34.359 INFO ERP.provider - JobSubmitter - job.nameSPLK_<hunk-server>_1443059205.319_0, Failed or killed tasks: 4
09-23-2015 18:47:34.420 INFO ERP.provider - JobSubmitter - time spent writing map errors, elapsed_ms=60
09-23-2015 18:47:34.498 INFO ERP.provider - FileBasedHeartBeat - stopping hearbeat thread, path=/user/hunk/dispatch/1443059205.319, interval=1000, _startIndex=0, _index=43
09-23-2015 18:47:34.499 WARN PreviewWriter - - data_source="/data/file/path/part-r-00000", data_host="<hunk-node>", data_sourcetype="none"
09-23-2015 18:47:34.503 ERROR ERP.provider - SplunkMR - Error while waiting for MapReduce job to complete, job_id=[!http://<namenode>:8088/proxy/application_1440205050298_7902/ job_1440205050298_7902], state=FAILED, reason=Task failed task_1440205050298_7902_m_000003
09-23-2015 18:47:34.503 ERROR ERP.provider - Job failed as tasks failed. failedMaps:1 failedReduces:0
09-23-2015 18:47:34.503 ERROR ERP.provider - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=[!http://<namenode>:8088/proxy/application_1440205050298_7902/ job_1440205050298_7902], state=FAILED, reason=Task failed task_1440205050298_7902_m_000003
09-23-2015 18:47:34.503 ERROR ERP.provider - Job failed as tasks failed. failedMaps:1 failedReduces:0
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:493)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.JobSubmitter.waitForCompletion(JobSubmitter.java:204)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1208)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:936)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:779)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1526)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1308)
09-23-2015 18:47:34.503 ERROR ERP.provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
09-23-2015 18:47:34.503 ERROR ERP.provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
09-23-2015 18:47:34.503 ERROR ERP.provider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1554)
09-23-2015 18:47:34.874 ERROR ERP.provider - Error while invoking command: /usr/hdp/2.2.4.2-2/hadoop/bin/hadoop com.splunk.mr.SplunkMR - Return code: 255
09-23-2015 18:47:34.875 WARN PreviewWriter - - data_source="/data/file/path/part-r-00000", data_host="<hunk-host>", data_sourcetype="none"
09-23-2015 18:47:34.875 INFO SearchOperator:stdin - StdinDataProvider::finish called
09-23-2015 18:47:34.875 INFO AggregatorMiningProcessor - Got done message for: source::/data/file/path/part-r-00000|host::<hunk-host>||2
09-23-2015 18:47:35.360 WARN SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
09-23-2015 18:47:35.552 INFO ERPSearchResultCollector - ERP peer=provider is done reading search results.
Task Tracker Log - Specifically the mapper described by the interface error (task_1440205050298_7902_m_000003)
2015-09-23 18:47:08,081 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2015-09-23 18:47:08,175 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-09-23 18:47:08,176 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-09-23 18:47:08,196 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-09-23 18:47:08,196 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1440205050298_7902, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@1680748d)
2015-09-23 18:47:08,273 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:provider, Ident: (HDFS_DELEGATION_TOKEN token 47861 for hunk)
2015-09-23 18:47:08,274 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.1.10:8188, Ident: (owner=hunk, renewer=yarn, realUser=, issueDate=1443059213325, maxDate=1443664013325, sequenceNumber=39, masterKeyId=2)
2015-09-23 18:47:08,389 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-09-23 18:47:08,622 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: <data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902,<data-node-loc>/usercache/hunk/appcache/application_1440205050298_7902
2015-09-23 18:47:09,118 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-09-23 18:47:09,664 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-09-23 18:47:09,975 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: /data/file/path/to/_SUCCESS:0+0
2015-09-23 18:47:10,191 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=null, _tz.id="<time-zone>", name="<time-zone>" for regex=/data/file/path/to(20\d\d)/(\d\d), format=yyyyMM
2015-09-23 18:47:10,191 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=null, _tz.id="<time-zone>", name="<time-zone>" for regex=/data/file/path/to(20\d\d)/(\d\d), format=yyyyMM
2015-09-23 18:47:10,209 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SplunkJournalRecordReader, message=File path does not match regex to use this record reader, class=com.splunk.mr.input.SplunkJournalRecordReader, name=splunkjournal, path=hdfs://data/file/path/to/_SUCCESS, regex=/journal\.gz$.
2015-09-23 18:47:10,218 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, class=com.splunk.mr.input.ValueAvroRecordReader, name=avro, path=hdfs://data/file/path/to/_SUCCESS, regex=\.avro$.
2015-09-23 18:47:10,225 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SimpleCSVRecordReader, message=File path does not match regex to use this record reader, class=com.splunk.mr.input.SimpleCSVRecordReader, name=csv, path=hdfs://data/file/path/to/_SUCCESS, regex=\.([tc]sv)(?:\.(?:gz|bz2|snappy))?$.
2015-09-23 18:47:10,230 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SequenceFileRecordReader, message=File path does not match regex to use this record reader, class=com.splunk.mr.input.SequenceFileRecordReader, name=sequence, path=hdfs://data/file/path/to/_SUCCESS, regex=\.seq$.
2015-09-23 18:47:10,230 INFO [main] com.splunk.mr.JobSubmitterInputFormat: using class=com.splunk.mr.input.SplunkLineRecordReader to process split=/data/file/path/to/_SUCCESS:0+0
2015-09-23 18:47:10,249 INFO [main] com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library
2015-09-23 18:47:10,252 INFO [main] com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 66217595dd210805b8cf223b2d1b3b6f77fda073]
2015-09-23 18:47:10,328 INFO [main] com.splunk.mr.SplunkSearchMapper: CONF_DN_HOME is set to /tmp/splunk/<hunk-server>/
2015-09-23 18:47:10,341 INFO [main] com.splunk.mr.SplunkSearchMapper: Ensuring Hunk is setup correctly took elapsed_ms=14
2015-09-23 18:47:10,497 INFO [main] com.splunk.mr.SplunkSearchMapper: _splunkProcess.getStdin()=java.lang.UNIXProcess$ProcessPipeOutputStream@267fd96e
2015-09-23 18:47:10,504 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
2015-09-23 18:47:10,535 INFO [main] com.splunk.mr.SplunkBaseMapper: Pushed records=0, bytes=0 to search process ...
2015-09-23 18:47:10,948 INFO [main] com.splunk.mr.SplunkSearchMapper: Processed records=0, bytes=0, return_code=134, elapsed_ms=588
2015-09-23 18:47:11,147 ERROR [main] com.splunk.mr.SplunkSearchMapper: Mapper exception=java.io.IOException: Search process failed to exit cleanly. Return code=134.
2015-09-23 18:47:11,318 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Search process failed to exit cleanly. Return code=134.
at com.splunk.mr.SplunkSearchMapper.runImpl(SplunkSearchMapper.java:325)
at com.splunk.mr.SplunkBaseMapper.run(SplunkBaseMapper.java:169)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-09-23 18:47:11,323 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Any help that anyone can provide would be appreciated.
I came across a similar issue today and was able to fix it. The issue was due to having "$HADOOP_CONF_DIR" as part of my Yarn classpath.
In my indexes.conf file, my Yarn classpath was set to this:
vix.yarn.application.classpath = $HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*,/usr/hdp/current/hadoop-mapreduce-client/*
Since $HADOOP_CONF_DIR is set to "/etc/hadoop/conf", I updated the Yarn classpath to this:
vix.yarn.application.classpath = /etc/hadoop/conf/*,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*,/usr/hdp/current/hadoop-mapreduce-client/*
I restarted the search head and ran another query. It completed with no errors. Hope this helps.
This looks like some error happened on your TaskTracker node. It would help to see the log for at least one of the failed tasks. Where you find this depends on your Hadoop distribution, but it is usually available via a web interface.
We're running Hadoop 2.x (Yarn) as from what I've read the TaskTracker nodes is for earlier Hadoop versions.
Still having the same issues