All Apps and Add-ons

Splunk Analytics for Hadoop & AWS EMR: java.io.IOException: Error while waiting for MapReduce job to complete

pkeenan87
Communicator

I have setup Splunk Analytics for Hadoop and configured it to use AWS EMR to search data in S3. The streaming searches work fine but when I start to run a MR job I get the following error:


Exception - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=job_1540315856620_0642, state=FAILED, reason=Task failed task_1540315856620_0642_m_000002

0 Karma
1 Solution

pkeenan87
Communicator

The issue ended up being that I had Data Model Acceleration turned on and it was running across my virtual indices. I had to modify the cim_* macros to prevent Splunk from executing MR jobs on my hadoop cluster every 5 minutes (the cluster was very small as I was just testing). Once I did that, searches started working fine

View solution in original post

pkeenan87
Communicator

The issue ended up being that I had Data Model Acceleration turned on and it was running across my virtual indices. I had to modify the cim_* macros to prevent Splunk from executing MR jobs on my hadoop cluster every 5 minutes (the cluster was very small as I was just testing). Once I did that, searches started working fine

richgalloway
SplunkTrust
SplunkTrust

@pkeenan87 If your problem is resolved, please accept the answer to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma

sduff_splunk
Splunk Employee
Splunk Employee

Have you confirmed that you can run Map Reduce jobs from the search head via the command-line, i.e., not using Splunk.

Confirm that you have Hadoop properly configured. My customer was missing the yarn-site.xml and core-site.xml from the /opt/hadoop/etc/hadoop/ directories, and their deployment exhibited the same issue as the one you have.

http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Setupcompressedfiletype...

Also refer to the following question on stack overflow, https://stackoverflow.com/questions/43425678/application-failed-2-times-due-to-am-container-exited-w... . Again, check that hadoop actually works from the CLI, check all CLASSPATH and PATH variables are correct, using the splunk user.

0 Karma

pkeenan87
Communicator

Here is the output from search.log:

10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Caused by: java.io.IOException: Error while waiting for MapReduce job to complete, job_id=[!http://ip-172-29-24-220.ec2.internal:8088/cluster/app/application_1540315856620_1358 job_1540315856620_1358], state=FAILED, reason=Application application_1540315856620_1358 failed 2 times due to AM Container for appattempt_1540315856620_1358_000002 exited with exitCode: 1
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - For more detailed output, check application tracking page:http://ip-172-29-24-220.ec2.internal:8088/cluster/app/application_1540315856620_1358Then, click on links to logs of each attempt.
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Diagnostics: Exception from container-launch.
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Container id: container_1540315856620_1358_02_000001
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Exit code: 1
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Stack trace: ExitCodeException exitCode=1:
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.util.Shell.run(Shell.java:479)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at java.util.concurrent.FutureTask.run(FutureTask.java:266)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at java.lang.Thread.run(Thread.java:748)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Container exited with a non-zero exit code 1
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - Failing this attempt. Failing the application.
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:534)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at com.splunk.mr.JobSubmitter.startJobImpl(JobSubmitter.java:647)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:759)
10-24-2018 09:16:18.799 ERROR ERP.test_hadoop - ... 10 more
10-24-2018 09:16:18.799 INFO ERP.test_hadoop - SplunkMR - finishing, version=6.2 ...

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...