All Apps and Add-ons

Errors when executing map reduce searches with Splunk Analytics for Hadoop

hortonew
Builder

I'm trying to do a simple | stats count over a virtual index and receiving errors. Thoughts on where to look for this one?

Splunk 7.3.3 / Splunk 8.x to EMR cluster with master and two slave nodes. It still produces a count, but I assume it's much slower than if it was doing a map-reduce on it.

Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_searchhead1.abc.corp.com_1580844860.138_0 ] and [ null ]

Edit:
Other testing performed:

  • Upgraded JDK from 1.7 to 1.8. No change to what works/doesn't work.
  • After adding vix.mapreduce.framework.name=yarn to indexes.conf and mapreduce.framework.name=yarn to yarn-site.xml, I get Exception - failed 2 times due to AM Container for appattempt_...
  • I've tested outside of splunk and still receive the AM Container error: yarn jar hadoop-streaming.jar streamjob -files wordSplitter.py -mapper wordSplitter.py -input input.txt -output wordCountOut -reducer aggregate
0 Karma
1 Solution

hortonew
Builder

The fix ended up being 2-fold:

  1. Making sure mapred-site.xml has the following: name mapreduce.framework.name set to value yarn

  2. to pull yarn-site.xml directly from what was in EMR master node (/usr/lib/hadoop-yarn/yarn-site.xml). Specifically, the yarn.application.classpath for EMR 5.28.0 is:
    $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*

Setting this resolved my latest issue.

View solution in original post

0 Karma

hortonew
Builder

The fix ended up being 2-fold:

  1. Making sure mapred-site.xml has the following: name mapreduce.framework.name set to value yarn

  2. to pull yarn-site.xml directly from what was in EMR master node (/usr/lib/hadoop-yarn/yarn-site.xml). Specifically, the yarn.application.classpath for EMR 5.28.0 is:
    $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*

Setting this resolved my latest issue.

0 Karma

hortonew
Builder

So my vix.mapreduce.framework.name was blank and I just set to yarn and now I get a different error. @rdagan_splunk or @jhornsby_splunk any ideas on this one?

Exception - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=job_1576770149627_0070, state=FAILED, reason=Application application_1576770149627_0070 failed 2 times due to AM Container for appattempt_1576770149627_0070_000002 exited with exitCode: 1

Doing more research now for additional logs.

Searching my hadoop cluster's logs, I see Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster which lead me to https://stackoverflow.com/questions/50927577/could-not-find-or-load-main-class-org-apache-hadoop-map....

The changes here don't seem to be doing anything.

0 Karma

jhornsby_splunk
Splunk Employee
Splunk Employee

Hi @hortonew,

Are you able to paste the output of the following:

 splunk btool indexes list provider:<name-of-your-provider-in-indexes.conf>

I'd like to compare to a working setup I have locally.

Cheers,

- Jo.

0 Karma

hortonew
Builder

Testing locally with hadoop cli, I'm running into issues. I feel like the problem stems from something in yarn-site.xml or mapred-site.xml but not really sure where to look.

0 Karma

hortonew
Builder
[provider:my-hadoop-provider]
vix.command = $SPLUNK_HOME/bin/jars/sudobash
vix.command.arg.1 = $HADOOP_HOME/bin/hadoop
vix.command.arg.2 = jar
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.command.arg.4 = com.splunk.mr.SplunkMR
vix.env.HADOOP_CLIENT_OPTS = -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.env.HADOOP_HEAPSIZE = 512
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr
vix.env.MAPREDUCE_USER =
vix.family = hadoop
vix.fs.default.name = hdfs://ip-172.29.29.29.ec2.internal:8020/
vix.mapred.child.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapred.job.map.memory.mb = 2048
vix.mapred.job.queue.name = default
vix.mapred.job.reduce.memory.mb = 512
vix.mapred.job.reuse.jvm.num.tasks = 100
vix.mapred.reduce.tasks = 2
vix.mapreduce.application.classpath = $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*, $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*, /usr/lib/hadoop-lzo/lib/*, /usr/share/aws/emr/emrfs/conf, /usr/share/aws/emr/emrfs/lib/*, /usr/share/aws/emr/emrfs/auxlib/*, /usr/share/aws/emr/lib/*, /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar, /usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar, /usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar, /usr/share/aws/emr/cloudwatch-sink/lib/*, /usr/share/aws/aws-java-sdk/*
vix.mapreduce.framework.name = yarn
vix.mapreduce.job.jvm.numtasks = 20
vix.mapreduce.job.queuename = default
vix.mapreduce.job.reduces = 3
vix.mapreduce.map.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.map.memory.mb = 2048
vix.mapreduce.reduce.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.reduce.memory.mb = 512
vix.mode = report
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.heartbeat = 1
vix.splunk.heartbeat.interval = 1000
vix.splunk.heartbeat.threshold = 60
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
vix.splunk.home.hdfs = /tmp/splunk/mysh.abc.corp.com/
vix.splunk.search.column.filter = 1
vix.splunk.search.debug = 1
vix.splunk.search.mixedmode = 1
vix.splunk.search.mr.maxsplits = 10000
vix.splunk.search.mr.minsplits = 100
vix.splunk.search.mr.poll = 2000
vix.splunk.search.mr.splits.multiplier = 10
vix.splunk.search.recordreader = SplunkJournalRecordReader,ValueAvroRecordReader,SimpleCSVRecordReader,SequenceFileRecordReader
vix.splunk.search.recordreader.avro.regex = \.avro$
vix.splunk.search.recordreader.csv.regex = \.([tc]sv)(?:\.(?:gz|bz2|snappy))?$
vix.splunk.search.recordreader.sequence.regex = \.seq$
vix.splunk.setup.onsearch = 1
vix.splunk.setup.package = current
vix.yarn.resourcemanager.address = hdfs://ip-172.29.29.29.ec2.internal:8032/
vix.yarn.resourcemanager.scheduler.address = hdfs://ip-172.29.29.29.ec2.internal:8030/
0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Can you see if these two Hadoop flags: yarn.resourcemanager.address (normally port 8032) / yarn.resourcemanager.scheduler.address (normally port 8030) are valid and have the same values you see in the Splunk Provider ?

0 Karma

hortonew
Builder

yarn-site.xml looks like the following, but get same error with / without:

<configuration>
<property>
   <name>yarn.resourcemanager.address</name>
   <value>hdfs://masternode:8032</value>
</property>
<property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>hdfs://masternode:8030</value>
</property>
</configuration>
0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

These two flags are misconfigured. it should be masternode:8030 not hdfs://masternode:8030 and same for the 8032

0 Karma

hortonew
Builder

Yep, thanks.

0 Karma

hortonew
Builder

Are there any others that must be configured? It's sort of a bare install of the client.

0 Karma

jhornsby_splunk
Splunk Employee
Splunk Employee

Hi @hortonew,

Does search.log not shed further light? Try setting vix.splunk.search.debug to 1 and see if that sheds more light. Also, what is your Hadoop Version set to?

Cheers,

- Jo.

0 Karma

hortonew
Builder

The provider is using

  • Hadoop 2.x (yarn)
  • with EMR-5.28.0
  • with Hadoop 2.8.5
  • Hive 2.3.6
  • Pig 0.17.0
  • Hug 4.4.0

My search head has:
- Hadoop CLI 2.8.5
- OpenJDK 1.7.0

Some additional logs:

02-05-2020 15:29:10.388 INFO  ERP.hadoop-cluster -  Job - The url to track the job: http://localhost:8080/
02-05-2020 15:29:10.388 INFO  ERP.hadoop-cluster -  AsyncMRJob - Done submitting job.name=SPLK_ec2.server.com_1580916537.266_0, url=http://localhost:8080/
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  SplunkMR - jobClient
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  java.lang.NoSuchFieldException: jobClient
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Class.getDeclaredField(Class.java:1961)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.SplunkMR.getJobClient(SplunkMR.java:592)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:136)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Thread.run(Thread.java:748)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  AsyncMRJob - 
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  java.lang.NullPointerException
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:137)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Thread.run(Thread.java:748)
02-05-2020 15:29:10.389 DEBUG ERP.hadoop-cluster -  OutputProcessor - received: null
02-05-2020 15:29:10.394 INFO  ERP.hadoop-cluster -  AsyncMRJob - start killing MR job id=job_local413061526_0001, job.name=SPLK_ec2.server.com_1580916537.266_0, _state=FAILED
02-05-2020 15:29:10.396 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - OutputCommitter set in config null
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  FileOutputCommitter - File Output Committer Algorithm version is 1
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
02-05-2020 15:29:10.411 DEBUG ERP.hadoop-cluster -  ProtobufRpcEngine$Invoker - Call: mkdirs took 6ms
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Starting mapper thread pool executor.
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Max local threads: 1
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Map tasks to process: 2
02-05-2020 15:29:10.445 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - Waiting for map tasks
02-05-2020 15:29:10.445 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job$MapTaskRunnable - Starting task: attempt_local413061526_0001_m_000000_0
02-05-2020 15:29:10.454 DEBUG ERP.hadoop-cluster -  SortedRanges$SkipRangeIterator - currentIndex 0   0:0
02-05-2020 15:29:10.469 DEBUG ERP.hadoop-cluster -  LocalJobRunner - mapreduce.cluster.local.dir for child : /tmp/hadoop-splunk/mapred/local/localRunner//splunk/jobcache/job_local413061526_0001/attempt_local413061526_0001_m_000000_0
02-05-2020 15:29:10.471 DEBUG ERP.hadoop-cluster -  Task - using new api for output committer
02-05-2020 15:29:10.475 INFO  ERP.hadoop-cluster -  FileOutputCommitter - File Output Committer Algorithm version is 1
0 Karma

hortonew
Builder

Some other logs that stood out:

02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response xml: true
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response entity: null
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response entity length: ??
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Releasing error response without XML content
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  RestStorageService - Rethrowing as a ServiceException error in performRequest: org.jets3t.service.ServiceException: Request Error., with cause: org.jets3t.service.impl.rest.HttpException
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  RestStorageService - Releasing HttpClient connection after error: Request Error.
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  Jets3tProperties - s3service.disable-dns-buckets=false
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com
0 Karma

codebuilder
Influencer

The permissions are likely not correct on your virtual index.

https://docs.splunk.com/Documentation/Splunk/8.0.1/HadoopAnalytics/Externalresultsproviders

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

hortonew
Builder

Looking through - trying to find exactly which permissions you're referring to. I don't see any communication over ports that aren't allowed through firewall. Splunk can search EMR and return data, just not run map reduce queries. So if you have a good spot to check specifically for mapreduce, let me know. Thanks.

0 Karma

codebuilder
Influencer

See the "Configure your Permissions" section.
https://docs.splunk.com/Documentation/Splunk/8.0.1/HadoopAnalytics/Externalresultsproviders#Before_y...

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

hortonew
Builder

Do you know what vix.splunk.home.datanode is suppose to look like? Should it be an hdfs:// path?

0 Karma

codebuilder
Influencer

This is generally set to the /tmp directory on the HDFS DataNode and needs to have read/write permissions. Splunk will create sub-directories under the location you define.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

hortonew
Builder

Gotcha, yea so that's all set. I saw splunk writing to /tmp/hadoop-splunk. So I think all permissions are set correctly. Side question: do you know of a setting to restrict how much data splunk can write to this directory? We saw it filling up the drive without cleanup.

0 Karma

codebuilder
Influencer

Not that I'm aware of. There could be a setting, I just don't know what it is. It would probably be just as easy to configure logrotate to purge that dir for you.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma
Get Updates on the Splunk Community!

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...