About hortonew

hortonew · ‎02-10-2020

Yep, thanks.

hortonew · ‎02-08-2020

The fix ended up being 2-fold: Making sure mapred-site.xml has the following: name mapreduce.framework.name set to value yarn to pull yarn-site.xml directly from what was in EMR master node (/usr/lib/hadoop-yarn/yarn-site.xml). Specifically, the yarn.application.classpath for EMR 5.28.0 is: $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/* Setting this resolved my latest issue.

hortonew · ‎02-08-2020

Thank you so much for this. It's 2020 and this helped solve my issue. If you're using EMR, make sure to SSH to your master node, cd /usr/lib/hadoop-yarn/ and look at yarn-site.xml for yarn.application.classpath and use what's in there in your hadoop client's yarn-site.xml. Mine turned out to be: $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,/usr/lib/hadoop-lzo/lib/*,/usr/share/aws/emr/emrfs/conf,/usr/share/aws/emr/emrfs/lib/*,/usr/share/aws/emr/emrfs/auxlib/*,/usr/share/aws/emr/lib/*,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar,/usr/share/aws/emr/cloudwatch-sink/lib/*,/usr/share/aws/aws-java-sdk/*

hortonew · ‎02-07-2020

Testing locally with hadoop cli, I'm running into issues. I feel like the problem stems from something in yarn-site.xml or mapred-site.xml but not really sure where to look.

hortonew · ‎02-07-2020

[provider:my-hadoop-provider] vix.command = $SPLUNK_HOME/bin/jars/sudobash vix.command.arg.1 = $HADOOP_HOME/bin/hadoop vix.command.arg.2 = jar vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar vix.command.arg.4 = com.splunk.mr.SplunkMR vix.env.HADOOP_CLIENT_OPTS = -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr vix.env.HADOOP_HEAPSIZE = 512 vix.env.HADOOP_HOME = /opt/hadoop vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar vix.env.JAVA_HOME = /usr vix.env.MAPREDUCE_USER = vix.family = hadoop vix.fs.default.name = hdfs://ip-172.29.29.29.ec2.internal:8020/ vix.mapred.child.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr vix.mapred.job.map.memory.mb = 2048 vix.mapred.job.queue.name = default vix.mapred.job.reduce.memory.mb = 512 vix.mapred.job.reuse.jvm.num.tasks = 100 vix.mapred.reduce.tasks = 2 vix.mapreduce.application.classpath = $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*, $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*, /usr/lib/hadoop-lzo/lib/*, /usr/share/aws/emr/emrfs/conf, /usr/share/aws/emr/emrfs/lib/*, /usr/share/aws/emr/emrfs/auxlib/*, /usr/share/aws/emr/lib/*, /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar, /usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar, /usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar, /usr/share/aws/emr/cloudwatch-sink/lib/*, /usr/share/aws/aws-java-sdk/* vix.mapreduce.framework.name = yarn vix.mapreduce.job.jvm.numtasks = 20 vix.mapreduce.job.queuename = default vix.mapreduce.job.reduces = 3 vix.mapreduce.map.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr vix.mapreduce.map.memory.mb = 2048 vix.mapreduce.reduce.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr vix.mapreduce.reduce.memory.mb = 512 vix.mode = report vix.output.buckets.max.network.bandwidth = 0 vix.splunk.heartbeat = 1 vix.splunk.heartbeat.interval = 1000 vix.splunk.heartbeat.threshold = 60 vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/ vix.splunk.home.hdfs = /tmp/splunk/mysh.abc.corp.com/ vix.splunk.search.column.filter = 1 vix.splunk.search.debug = 1 vix.splunk.search.mixedmode = 1 vix.splunk.search.mr.maxsplits = 10000 vix.splunk.search.mr.minsplits = 100 vix.splunk.search.mr.poll = 2000 vix.splunk.search.mr.splits.multiplier = 10 vix.splunk.search.recordreader = SplunkJournalRecordReader,ValueAvroRecordReader,SimpleCSVRecordReader,SequenceFileRecordReader vix.splunk.search.recordreader.avro.regex = \.avro$ vix.splunk.search.recordreader.csv.regex = \.([tc]sv)(?:\.(?:gz|bz2|snappy))?$ vix.splunk.search.recordreader.sequence.regex = \.seq$ vix.splunk.setup.onsearch = 1 vix.splunk.setup.package = current vix.yarn.resourcemanager.address = hdfs://ip-172.29.29.29.ec2.internal:8032/ vix.yarn.resourcemanager.scheduler.address = hdfs://ip-172.29.29.29.ec2.internal:8030/

hortonew · ‎02-06-2020

So my vix.mapreduce.framework.name was blank and I just set to yarn and now I get a different error. @rdagan_splunk or @jhornsby_splunk any ideas on this one? Exception - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=job_1576770149627_0070, state=FAILED, reason=Application application_1576770149627_0070 failed 2 times due to AM Container for appattempt_1576770149627_0070_000002 exited with exitCode: 1 Doing more research now for additional logs. Searching my hadoop cluster's logs, I see Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster which lead me to https://stackoverflow.com/questions/50927577/could-not-find-or-load-main-class-org-apache-hadoop-mapreduce-v2-app-mrappmaster. The changes here don't seem to be doing anything.

hortonew · ‎02-06-2020

It makes sense, ok. Will research that further after solving this issue. Thanks.

hortonew · ‎02-06-2020

Gotcha, yea so that's all set. I saw splunk writing to /tmp/hadoop-splunk. So I think all permissions are set correctly. Side question: do you know of a setting to restrict how much data splunk can write to this directory? We saw it filling up the drive without cleanup.

hortonew · ‎02-06-2020

Do you know what vix.splunk.home.datanode is suppose to look like? Should it be an hdfs:// path?

hortonew · ‎02-05-2020

Are there any others that must be configured? It's sort of a bare install of the client.

hortonew · ‎02-05-2020

yarn-site.xml looks like the following, but get same error with / without: <configuration> <property> <name>yarn.resourcemanager.address</name> <value>hdfs://masternode:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hdfs://masternode:8030</value> </property> </configuration>

hortonew · ‎02-05-2020

Some other logs that stood out: 02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster - RestStorageService - Response xml: true 02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster - RestStorageService - Response entity: null 02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster - RestStorageService - Response entity length: ?? 02-05-2020 15:29:07.564 DEBUG ERP.hadoop-cluster - RestStorageService - Releasing error response without XML content 02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster - RestStorageService - Rethrowing as a ServiceException error in performRequest: org.jets3t.service.ServiceException: Request Error., with cause: org.jets3t.service.impl.rest.HttpException 02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster - RestStorageService - Releasing HttpClient connection after error: Request Error. 02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster - Jets3tProperties - s3service.disable-dns-buckets=false 02-05-2020 15:29:07.565 DEBUG ERP.hadoop-cluster - Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com

hortonew · ‎02-05-2020

Looking through - trying to find exactly which permissions you're referring to. I don't see any communication over ports that aren't allowed through firewall. Splunk can search EMR and return data, just not run map reduce queries. So if you have a good spot to check specifically for mapreduce, let me know. Thanks.

hortonew · ‎02-05-2020

The provider is using Hadoop 2.x (yarn) with EMR-5.28.0 with Hadoop 2.8.5 Hive 2.3.6 Pig 0.17.0 Hug 4.4.0 My search head has: - Hadoop CLI 2.8.5 - OpenJDK 1.7.0 Some additional logs: 02-05-2020 15:29:10.388 INFO ERP.hadoop-cluster - Job - The url to track the job: http://localhost:8080/ 02-05-2020 15:29:10.388 INFO ERP.hadoop-cluster - AsyncMRJob - Done submitting job.name=SPLK_ec2.server.com_1580916537.266_0, url=http://localhost:8080/ 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - SplunkMR - jobClient 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - java.lang.NoSuchFieldException: jobClient 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at java.lang.Class.getDeclaredField(Class.java:1961) 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at com.splunk.mr.SplunkMR.getJobClient(SplunkMR.java:592) 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:136) 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at java.lang.Thread.run(Thread.java:748) 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - AsyncMRJob - 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - java.lang.NullPointerException 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at com.splunk.mr.AsyncMRJob.run(AsyncMRJob.java:137) 02-05-2020 15:29:10.389 ERROR ERP.hadoop-cluster - at java.lang.Thread.run(Thread.java:748) 02-05-2020 15:29:10.389 DEBUG ERP.hadoop-cluster - OutputProcessor - received: null 02-05-2020 15:29:10.394 INFO ERP.hadoop-cluster - AsyncMRJob - start killing MR job id=job_local413061526_0001, job.name=SPLK_ec2.server.com_1580916537.266_0, _state=FAILED 02-05-2020 15:29:10.396 INFO ERP.hadoop-cluster - LocalJobRunner$Job - OutputCommitter set in config null 02-05-2020 15:29:10.401 INFO ERP.hadoop-cluster - FileOutputCommitter - File Output Committer Algorithm version is 1 02-05-2020 15:29:10.401 INFO ERP.hadoop-cluster - FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 02-05-2020 15:29:10.401 INFO ERP.hadoop-cluster - LocalJobRunner$Job - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 02-05-2020 15:29:10.411 DEBUG ERP.hadoop-cluster - ProtobufRpcEngine$Invoker - Call: mkdirs took 6ms 02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster - LocalJobRunner$Job - Starting mapper thread pool executor. 02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster - LocalJobRunner$Job - Max local threads: 1 02-05-2020 15:29:10.444 DEBUG ERP.hadoop-cluster - LocalJobRunner$Job - Map tasks to process: 2 02-05-2020 15:29:10.445 INFO ERP.hadoop-cluster - LocalJobRunner$Job - Waiting for map tasks 02-05-2020 15:29:10.445 INFO ERP.hadoop-cluster - LocalJobRunner$Job$MapTaskRunnable - Starting task: attempt_local413061526_0001_m_000000_0 02-05-2020 15:29:10.454 DEBUG ERP.hadoop-cluster - SortedRanges$SkipRangeIterator - currentIndex 0 0:0 02-05-2020 15:29:10.469 DEBUG ERP.hadoop-cluster - LocalJobRunner - mapreduce.cluster.local.dir for child : /tmp/hadoop-splunk/mapred/local/localRunner//splunk/jobcache/job_local413061526_0001/attempt_local413061526_0001_m_000000_0 02-05-2020 15:29:10.471 DEBUG ERP.hadoop-cluster - Task - using new api for output committer 02-05-2020 15:29:10.475 INFO ERP.hadoop-cluster - FileOutputCommitter - File Output Committer Algorithm version is 1

hortonew · ‎02-04-2020

I'm trying to do a simple | stats count over a virtual index and receiving errors. Thoughts on where to look for this one? Splunk 7.3.3 / Splunk 8.x to EMR cluster with master and two slave nodes. It still produces a count, but I assume it's much slower than if it was doing a map-reduce on it. Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_searchhead1.abc.corp.com_1580844860.138_0 ] and [ null ] Edit: Other testing performed: Upgraded JDK from 1.7 to 1.8. No change to what works/doesn't work. After adding vix.mapreduce.framework.name=yarn to indexes.conf and mapreduce.framework.name=yarn to yarn-site.xml, I get Exception - failed 2 times due to AM Container for appattempt_... I've tested outside of splunk and still receive the AM Container error: yarn jar hadoop-streaming.jar streamjob -files wordSplitter.py -mapper wordSplitter.py -input input.txt -output wordCountOut -reducer aggregate

hortonew · ‎01-22-2020

Probably unsupported, but you can take an 8.0 install, copy /opt/splunk/bin/jars/SplunkMR-hy2.jar and copy into your 7.3.3 install to fix this issue.

hortonew · ‎01-22-2020

Nevermind - it seems 8.0 does in fact resolve the issue. I just tested.

hortonew · ‎01-22-2020

Thanks Jo. Just to confirm, this is currently unresolved even in the latest release of splunk? If so, is there any fix planned that will be applied to the 7.3.x chain in say, 7.3.5? Thanks again. Or did you mean it was patched, but it never made it in to release notes?

hortonew · ‎01-21-2020

Hey thanks for the response. Any chance you can post which release notes items directly corrects this? I need to read up on what's causing it. Thanks!

hortonew · ‎01-21-2020

Without a virtual index enabled, running | metadata type=sourcetypes index=* will return correctly. Adding a virtual index that uses a hadoop provider, this command now fails due to the fact that it can't find sourcetype details. Searching the virtual index however returns correct sourcetype details. What is necessary for the metadata command to return successfully? Is there a file I need next to the data to dictate the sourcetype info? Can I remove this index from the metadata results without having to manually specify all indexes I want in the command? Error: 01-15-2020 20:57:40.884 ERROR metadata - No 'sourcetype' key found in results. Cannot merge metadata. 01-15-2020 20:57:40.884 INFO PreviewExecutor - Finished preview generation in 0.002741056 seconds. 01-15-2020 20:57:40.901 INFO ReducePhaseExecutor - Ending phase_1 01-15-2020 20:57:40.901 INFO UserManager - Unwound user context: x@y.com -> NULL 01-15-2020 20:57:40.901 ERROR SearchOrchestrator - Phase_1 failed due to : Error in 'metadata': No 'sourcetype' key found in results. Cannot merge metadata. 01-15-2020 20:57:40.901 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL 01-15-2020 20:57:40.901 INFO DispatchExecutor - User applied action=CANCEL while status=0 01-15-2020 20:57:40.901 ERROR SearchStatusEnforcer - sid:md_1579121855.178190 Error in 'metadata': No 'sourcetype' key found in results. Cannot merge metadata. Version info: Splunk 7.3.3 Hadoop cli 2.8.4 AWS EMR emr-5.28.0

hortonew · ‎07-25-2019

<setup> <block title="Title of page"> <text>All fields are required.</text> </block> <block title="Add new credentials" endpoint="storage/passwords" entity="_new"> <input field="name"> <label>Account ID</label> <type>text</type> </input> <input field="password"> <label>API Key</label> <type>password</type> </input> <text> <![CDATA[ <script type="text/javascript"> $(function() { $('label[for*="password_id_confirm"]').html("Confirm API Key") }); </script> ]]> </text> </block> </setup>

hortonew · ‎04-30-2019

Which file?

hortonew · ‎04-29-2019

All I can find in the docs is: https://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileswithstructureddata No support for mid-file renaming of header fields Some software, such as Internet Information Server, supports the renaming of header fields in the middle of the file. Splunk software does not recognize changes such as this. If you attempt to index a file that has header fields renamed within the file, the renamed header field is not indexed.

hortonew · ‎04-29-2019

We have a single Splunk instance with custom scripted input that pulls down json, and has indexed extractions. New fields were added to the json that aren't getting extracted. We want to be able to remove the known headers that Splunk knows of (what fields to extract), so that it can start over and pick up newly added fields. Is there any method of doing this? Are our only options: 1) change sourcetype or 2) use search time extractions?

hortonew · ‎08-02-2018

I would recommend utilizing the kvstore to maintain state if you're going to want to know the current state of all your machines. Every x minutes check for any new events, and overwrite the existing value for a host with the color/status. inputlookup this kv store, find all new statuses, dedup to get the latest values, then outputlookup append=t to save any changes. Then when you're trying to view status, you only have to input the kvstore to your display.

Posts	215
Solutions	21
Karma Given	89
Karma Received	78
Member Since	‎11-30-2011

Online Status	Offline
Date Last Visited	‎09-08-2021 12:59 PM

Errors when executing map reduce searches with Spl...

Virtual index causing metadata command to error ou...

Modify json structure for sourcetype that has inde...

Splunk sub-processes start/stop every minute (splu...

How to edit my props.conf for a custom field extra...

After upgrading DB Connect 2.0.x to 2.1.0, why are...

Parsing the following sourcetype for a custom fiel...

Pass starttime/endtime results to another search

How to overwrite a default entry in commands.conf ...

External lookup failing with error code 1 - 6.1.2

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Hunk has problems running MapReduce jobs again...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Re: Errors when executing map reduce searches with...

Errors when executing map reduce searches with Spl...

Re: Virtual index causing metadata command to erro...

Re: Virtual index causing metadata command to erro...

Re: Virtual index causing metadata command to erro...

Re: Virtual index causing metadata command to erro...

Virtual index causing metadata command to error ou...

Re: Delete or change generated "Confirm Password" ...

Re: Modify json structure for sourcetype that has ...

Re: Modify json structure for sourcetype that has ...

Modify json structure for sourcetype that has inde...

Re: Persisting the health of a server using the ev...

Are you a member of the Splunk Community?