 
					
				
		
I'm trying to do a simple | stats count over a virtual index and receiving errors.  Thoughts on where to look for this one?
Splunk 7.3.3 / Splunk 8.x to EMR cluster with master and two slave nodes. It still produces a count, but I assume it's much slower than if it was doing a map-reduce on it.
Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_searchhead1.abc.corp.com_1580844860.138_0 ] and [ null ]
Edit:
Other testing performed:
vix.mapreduce.framework.name=yarn to indexes.conf and mapreduce.framework.name=yarn to yarn-site.xml, I get Exception - failed 2 times due to AM Container for appattempt_...yarn jar hadoop-streaming.jar streamjob -files wordSplitter.py -mapper wordSplitter.py -input input.txt -output wordCountOut -reducer aggregate 
					
				
		
The fix ended up being 2-fold:
Making sure mapred-site.xml has the following: name mapreduce.framework.name set to value yarn
to pull yarn-site.xml directly from what was in EMR master node (/usr/lib/hadoop-yarn/yarn-site.xml).  Specifically, the yarn.application.classpath for EMR 5.28.0 is:
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*
Setting this resolved my latest issue.
 
					
				
		
The fix ended up being 2-fold:
Making sure mapred-site.xml has the following: name mapreduce.framework.name set to value yarn
to pull yarn-site.xml directly from what was in EMR master node (/usr/lib/hadoop-yarn/yarn-site.xml).  Specifically, the yarn.application.classpath for EMR 5.28.0 is:
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*
Setting this resolved my latest issue.
 
					
				
		
So my vix.mapreduce.framework.name was blank and I just set to yarn and now I get a different error.  @rdagan_splunk or @jhornsby_splunk  any ideas on this one?
Exception - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=job_1576770149627_0070, state=FAILED, reason=Application application_1576770149627_0070 failed 2 times due to AM Container for appattempt_1576770149627_0070_000002 exited with exitCode: 1
Doing more research now for additional logs.
Searching my hadoop cluster's logs, I see Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster which lead me to https://stackoverflow.com/questions/50927577/could-not-find-or-load-main-class-org-apache-hadoop-map....  
The changes here don't seem to be doing anything.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @hortonew,
Are you able to paste the output of the following:
 splunk btool indexes list provider:<name-of-your-provider-in-indexes.conf>
I'd like to compare to a working setup I have locally.
Cheers,
- Jo.
 
					
				
		
Testing locally with hadoop cli, I'm running into issues. I feel like the problem stems from something in yarn-site.xml or mapred-site.xml but not really sure where to look.
 
					
				
		
[provider:my-hadoop-provider]
vix.command = $SPLUNK_HOME/bin/jars/sudobash
vix.command.arg.1 = $HADOOP_HOME/bin/hadoop
vix.command.arg.2 = jar
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.command.arg.4 = com.splunk.mr.SplunkMR
vix.env.HADOOP_CLIENT_OPTS = -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.env.HADOOP_HEAPSIZE = 512
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr
vix.env.MAPREDUCE_USER =
vix.family = hadoop
vix.fs.default.name = hdfs://ip-172.29.29.29.ec2.internal:8020/
vix.mapred.child.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapred.job.map.memory.mb = 2048
vix.mapred.job.queue.name = default
vix.mapred.job.reduce.memory.mb = 512
vix.mapred.job.reuse.jvm.num.tasks = 100
vix.mapred.reduce.tasks = 2
vix.mapreduce.application.classpath = $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*, $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*, /usr/lib/hadoop-lzo/lib/*, /usr/share/aws/emr/emrfs/conf, /usr/share/aws/emr/emrfs/lib/*, /usr/share/aws/emr/emrfs/auxlib/*, /usr/share/aws/emr/lib/*, /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar, /usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar, /usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar, /usr/share/aws/emr/cloudwatch-sink/lib/*, /usr/share/aws/aws-java-sdk/*
vix.mapreduce.framework.name = yarn
vix.mapreduce.job.jvm.numtasks = 20
vix.mapreduce.job.queuename = default
vix.mapreduce.job.reduces = 3
vix.mapreduce.map.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.map.memory.mb = 2048
vix.mapreduce.reduce.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.reduce.memory.mb = 512
vix.mode = report
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.heartbeat = 1
vix.splunk.heartbeat.interval = 1000
vix.splunk.heartbeat.threshold = 60
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
vix.splunk.home.hdfs = /tmp/splunk/mysh.abc.corp.com/
vix.splunk.search.column.filter = 1
vix.splunk.search.debug = 1
vix.splunk.search.mixedmode = 1
vix.splunk.search.mr.maxsplits = 10000
vix.splunk.search.mr.minsplits = 100
vix.splunk.search.mr.poll = 2000
vix.splunk.search.mr.splits.multiplier = 10
vix.splunk.search.recordreader = SplunkJournalRecordReader,ValueAvroRecordReader,SimpleCSVRecordReader,SequenceFileRecordReader
vix.splunk.search.recordreader.avro.regex = \.avro$
vix.splunk.search.recordreader.csv.regex = \.([tc]sv)(?:\.(?:gz|bz2|snappy))?$
vix.splunk.search.recordreader.sequence.regex = \.seq$
vix.splunk.setup.onsearch = 1
vix.splunk.setup.package = current
vix.yarn.resourcemanager.address = hdfs://ip-172.29.29.29.ec2.internal:8032/
vix.yarn.resourcemanager.scheduler.address = hdfs://ip-172.29.29.29.ec2.internal:8030/
 
		
		
		
		
		
	
			
		
		
			
					
		Can you see if these two Hadoop flags: yarn.resourcemanager.address (normally port 8032) / yarn.resourcemanager.scheduler.address (normally port 8030) are valid and have the same values you see in the Splunk Provider ?
 
					
				
		
yarn-site.xml looks like the following, but get same error with / without:
<configuration>
<property>
   <name>yarn.resourcemanager.address</name>
   <value>hdfs://masternode:8032</value>
</property>
<property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>hdfs://masternode:8030</value>
</property>
</configuration>
 
		
		
		
		
		
	
			
		
		
			
					
		These two flags are misconfigured. it should be masternode:8030 not hdfs://masternode:8030 and same for the 8032
 
					
				
		
Yep, thanks.
 
					
				
		
Are there any others that must be configured? It's sort of a bare install of the client.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @hortonew,
Does search.log not shed further light?  Try setting vix.splunk.search.debug to 1 and see if that sheds more light.  Also, what is your Hadoop Version set to?
Cheers,
- Jo.
 
					
				
		
The provider is using
My search head has:
- Hadoop CLI 2.8.5
- OpenJDK 1.7.0
Some additional logs:
02-05-2020 15:29:10.388 INFO  ERP.hadoop-cluster -  Job - The url to track the job: http://localhost:8080/
02-05-2020 15:29:10.388 INFO  ERP.hadoop-cluster -  AsyncMRJob - Done submitting job.name=SPLK_ec2.server.com_1580916537.266_0, url=http://localhost:8080/
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  SplunkMR - jobClient
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  java.lang.NoSuchFieldException: jobClient
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Class.getDeclaredField(Class.java:1961)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.SplunkMR.getJobClient(SplunkMR.java:592)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:136)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Thread.run(Thread.java:748)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  AsyncMRJob - 
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -  java.lang.NullPointerException
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:137)
02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster -      at java.lang.Thread.run(Thread.java:748)
02-05-2020 15:29:10.389 DEBUG ERP.hadoop-cluster -  OutputProcessor - received: null
02-05-2020 15:29:10.394 INFO  ERP.hadoop-cluster -  AsyncMRJob - start killing MR job id=job_local413061526_0001, job.name=SPLK_ec2.server.com_1580916537.266_0, _state=FAILED
02-05-2020 15:29:10.396 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - OutputCommitter set in config null
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  FileOutputCommitter - File Output Committer Algorithm version is 1
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
02-05-2020 15:29:10.401 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
02-05-2020 15:29:10.411 DEBUG ERP.hadoop-cluster -  ProtobufRpcEngine$Invoker - Call: mkdirs took 6ms
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Starting mapper thread pool executor.
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Max local threads: 1
02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster -  LocalJobRunner$Job - Map tasks to process: 2
02-05-2020 15:29:10.445 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job - Waiting for map tasks
02-05-2020 15:29:10.445 INFO  ERP.hadoop-cluster -  LocalJobRunner$Job$MapTaskRunnable - Starting task: attempt_local413061526_0001_m_000000_0
02-05-2020 15:29:10.454 DEBUG ERP.hadoop-cluster -  SortedRanges$SkipRangeIterator - currentIndex 0   0:0
02-05-2020 15:29:10.469 DEBUG ERP.hadoop-cluster -  LocalJobRunner - mapreduce.cluster.local.dir for child : /tmp/hadoop-splunk/mapred/local/localRunner//splunk/jobcache/job_local413061526_0001/attempt_local413061526_0001_m_000000_0
02-05-2020 15:29:10.471 DEBUG ERP.hadoop-cluster -  Task - using new api for output committer
02-05-2020 15:29:10.475 INFO  ERP.hadoop-cluster -  FileOutputCommitter - File Output Committer Algorithm version is 1
 
					
				
		
Some other logs that stood out:
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response xml: true
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response entity: null
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Response entity length: ??
02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster -  RestStorageService - Releasing error response without XML content
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  RestStorageService - Rethrowing as a ServiceException error in performRequest: org.jets3t.service.ServiceException: Request Error., with cause: org.jets3t.service.impl.rest.HttpException
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  RestStorageService - Releasing HttpClient connection after error: Request Error.
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  Jets3tProperties - s3service.disable-dns-buckets=false
02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster -  Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com
The permissions are likely not correct on your virtual index.
https://docs.splunk.com/Documentation/Splunk/8.0.1/HadoopAnalytics/Externalresultsproviders
 
					
				
		
Looking through - trying to find exactly which permissions you're referring to. I don't see any communication over ports that aren't allowed through firewall. Splunk can search EMR and return data, just not run map reduce queries. So if you have a good spot to check specifically for mapreduce, let me know. Thanks.
See the "Configure your Permissions" section.
https://docs.splunk.com/Documentation/Splunk/8.0.1/HadoopAnalytics/Externalresultsproviders#Before_y...
 
					
				
		
Do you know what vix.splunk.home.datanode is suppose to look like? Should it be an hdfs:// path?
This is generally set to the /tmp directory on the HDFS DataNode and needs to have read/write permissions. Splunk will create sub-directories under the location you define.
 
					
				
		
Gotcha, yea so that's all set. I saw splunk writing to /tmp/hadoop-splunk. So I think all permissions are set correctly. Side question: do you know of a setting to restrict how much data splunk can write to this directory? We saw it filling up the drive without cleanup.
Not that I'm aware of. There could be a setting, I just don't know what it is. It would probably be just as easy to configure logrotate to purge that dir for you.
