About emiller42

emiller42 · ‎03-12-2018

I assume you're in a VPC. Do you have security groups set up to allow traffic from the ELB to your instance on port 8000?

emiller42 · ‎03-12-2018

When you look at the instances tab of your load balancer, is the instance listed? What is it's Status?

emiller42 · ‎03-12-2018

You could change the ELB listener port. Or you could just use https://ELB/en-US/account/login If your ELB is listening on 443, send your requests to 443. 🙂

emiller42 · ‎03-12-2018

Your ELB is listening on 443. You are hitting your ELB on 8000. Consider that carefully. 🙂

emiller42 · ‎03-12-2018

Since the previous was edited: What is the listener configuration on the ELB? a 503 implies that there is either no listener on 8000, or the listener has no accessible backend. Can you verify the listener is set up as HTTPS 8000 -> HTTPS 8000?

emiller42 · ‎09-15-2017

Thank you for this!

emiller42 · ‎07-06-2017

Was that 500MB file compressed? a gzipped logfile of that size is likely closer to 3-4GB when uncompressed, and that's what Splunk is considering when doing license calculations.

emiller42 · ‎07-06-2017

The license usage in Splunk is not only the size of raw data files it monitors but includes all associated files (index files and raw data file) which Splunk has to generate to make it searchable That is not correct, License usage is solely based on the volume of data ingested.

emiller42 · ‎09-14-2016

This is because using single-quotes isn't valid JSON, so it can't parse it as JSON. {"aaa": 1, "bbb": "some value"} vs. {'aaa': 1, 'bbb': 'some value'} The first is JSON. The second is not. Since it's a field extracted from a larger JSON, I'm going to assume it's just incorrectly constructed. Something like this would work fine, and not require multiple spath calls: {"id": 12345, "message": {"level": "INFO", "content": "foo bar baz"}} But I'm going to guess what you have to work with is: {"id": 12345, "message": "{'level': 'INFO', 'content': 'foo bar baz'}"} Which isn't valid JSON at all. (or more specifically, message is just a string, and not a JSON object)

emiller42 · ‎07-21-2016

We recently switched from EBS to d2's, and are very happy with the results so far. using d2.4xls with the ephemeral in a RAID0 gives us IOPS in the 7k range, and we can handle indexing throughput spikes of up to 7MB/s without appreciable queue saturation. Yes, EBS can get better IO, but the fact that you're throttled on throughput makes it hard to actually take advantage of it. It's been about 4 months so far, and we haven't had an indexer fail yet. We have had one go temporarily unavailable for about 20 minutes, but there was no data loss. The cluster handled that without any real end-user impact. All in all, I would have a hard time recommending EBS backed instances unless the install is small enough that running multiple d2s is overkill. You definitely want to have replication happening if you use them.

emiller42 · ‎05-27-2016

So the forwarder load balancing is a little interesting. A forwarder will switch targets on a regular interval. (Default 30 seconds. autoLBFrequency, set in outputs.conf) This means that at any given time, a forwarder is only sending to one indexer. It isn't round robin, instead regularly randomizing the indexer list. However, it only makes the switch when it's considered 'safe' to do so, to avoid half of an event going to Indexer A, and the other half going to indexer B. This means EOF on a file read, and 10 seconds of inactivity on a TCP connection. So if your forwarders aren't keeping up with file writes, it's possible for them to get 'stuck' on an indexer, and for that 30 second period to extend quite a bit. To mitigate, you can set forceTimebasedAutoLB = true (again in Outputs.conf) but then you run into potential problems with events getting split. I wouldn't recommend this. It's also worth noting that the forwarder doesn't know anything about the state of the indexer besides it being a valid target for data. It doesn't know if a particular index is full or not.

emiller42 · ‎12-06-2015

You can optionally add a span= to bucket just like timechart . ... | bucket _time span=1d | ...

emiller42 · ‎12-06-2015

You can't visualize this directly, but I use the following when I want time series data on multiple dimensions: ...| bucket _time | stats count by _time field_1 field_2

emiller42 · ‎12-03-2015

Just to reinforce this: use SHOULD_LINEMERGE=false and LINE_BREAKER= whenever possible.

emiller42 · ‎12-02-2015

Note: in 6.3 deploying user scoped configs (private) via the deployer keeps them in a state where users can delete them from within the cluster normally. I assume the issue here is with anything in the 'search_migration_app' scope, which would still be merged down to default on bundle push.

emiller42 · ‎12-01-2015

Anything deployed via the cluster deployer has all of its config files moved from 'local' scope to 'default' scope. As such, it cannot be deleted via UI within the cluster itself. If you want to delete anything from your 'search_migration_app', you must remove it from the copy contained on the cluster deployer (in $SPLUNK_HOME/etc/shcluster/apps ) and then execute a bundle push.

emiller42 · ‎12-01-2015

Good luck! Depending on how the 3rd-party app is set up, you may still be able to define the logging format. (It may just be defined in a config file which you can update) Worth looking into.

emiller42 · ‎12-01-2015

As I mentioned in a comment above, SEDCMD is evaluated after timestamp extraction, so you can't fix this via transform. You can, however, explicitly tell Splunk what the time format is. (Details on How Indexing Works) props.conf [your_sourcetype] TIME_PREFIX=^ TIME_FORMAT=%m/%d/%I:%M:%S %p MAX_TIMESTAMP_LOOKAHEAD=17 This tells Splunk that the timestamp comes at the beginning of the event (TIME_PREFIX), it has the above strftime format, and it extends, at most, 17 characters into the event. Everything it needs to know to get the timestamp right. This may still not work, as without a year it's not a valid timestamp, so Splunk may still do funny things with it. The real fix is to get your developers to log in a non-ridiculous format. (What Java devs have against ISO standard timestamps, I'll never figure out) I would also set a couple other things: [your_sourcetype] TIME_PREFIX=^ TIME_FORMAT=%m/%d/%I:%M:%S %p MAX_TIMESTAMP_LOOKAHEAD=17 SHOULD_LINEMERGE=false LINE_BREAKER=([\r\n]+)(?:\d{2}\/\d{2}\/\d{2}:\d{2}:\d{2}) TRUNCATE=999999 This further tells Splunk how to handle the incoming events. Specifically, we're telling Splunk where events begin and end explicitly, so it doesn't have to figure it out. (You'll appreciate this when it stops Splunk from doing bad things with stacktraces) Ideally, you should be setting all of these for every new sourcetype you ingest whenever possible.

emiller42 · ‎11-30-2015

Regarding #2: I run the detection search on the Cluster Deployer, which is not a part of the cluster, and thus unaffected by the issue.

emiller42 · ‎11-25-2015

they're all identically provisioned servers, both in resources and configuration. I'll check for jobs erroring or being reassigned.

emiller42 · ‎11-25-2015

EDIT: This is still an issue in Splunk 6.4 I'm noticing that over time, my SHC captain starts favoring one cluster node over the others for scheduled job delegation. After a cluster restart, the scheduler returns to even balancing, but the pattern repeats. Jobs are still getting done, but it seems odd that one node will get all the jobs while the others are left relatively idle. There appears to be a setting controlling this in server.conf: scheduling_heuristic = <string> * This setting configures the job distribution heuristic on the captain. * There are currently two supported strategies: 'round_robin' or 'scheduler_load_based'. * Default is 'scheduler_load_based'. However, I can't seem to find any documentation on how 'scheduler_load_balanced' works, and thus what might be going on in my install. Can anyone clarify and/or point me to more information?

emiller42 · ‎11-24-2015

EDIT: New details as of 12/11. Scroll down! Answering this one myself to get the info out there: This has been registered as a bug with Splunk support. (SPL-109514) Details of the issue, how to detect it, and how to work around it follow: Description There appears to be a problem with quota calculation in the search scheduler that is specific to a clustered deployment. Splunk will not dispatch a saved search if the user has reached their concurrent search quota. (As defined in authorize.conf) However, it appears the current usage is not calculated correctly, causing the user to show as over-quota. We see this 'usage' value slowly grow over time. Splunk emits WARNs when this happens: 11-18-2015 11:10:34.638 -0600 WARN SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search While some of these may be legit, users affected by this bug generate a far higher volume of them. (We see WARNs every 7 seconds for each scheduled search owned by the affected user) A confusing side-effect of this is that because the searches are never dispatched, they don't report as skipped. So if you're looking at scheduler metrics in the DMC, it looks like everything is successfully running. Detection Because of the sheer volume of WARNs generated, you can use that to detect the issue: We run the following with a 5-minute window as an alert: index=_internal sourcetype=splunkd component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached" | rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)" | fields _time usage quota user search_* | stats count by user search_name | where count>40 | stats values(search_name) as affected_searches by user Alert if any records are returned. Impact This can prevent alert searches from running. Depending on the importance of those alerts, the impact can be severe. The frequency of this issue can vary, and appears to be related to overall scheduler activity. Our production cluster saw it happen every day or so, while in a lower volume testing environment it could take over a week to surface. Remediation If affected by this issue, a rolling-restart of the search head cluster will get things moving again. However, the issue will recur. So this becomes an active maintenance thing. NEW 12/11 - Workaround This issue is related to new functionality in Splunk 6.3. Pre-6.3, Splunk calculated quotas independently on each search head. In 6.3, this changed to calculating cluster-wide quotas. This new behavior makes sense, but doesn't seem to work correctly in practice. You can restore splunk to it's pre-6.3 behavior by adding the following in limits.conf: [scheduler] shc_role_quota_enforcement = false shc_local_quota_check = true Alternative Workaround A way to prevent the bug from occurring is to remove all role-based concurrent search quotas. Note that this leaves your users free to run concurrent searches up to the sever-based restrictions in limits.conf. Since we weren't certain of the interaction of imported roles here, we explicitly set all roles to zero, including built in roles ('default', 'user', 'power', and 'admin') Example authorize.conf stanza: [role_admin] srchJobsQuota = 0 rtSrchJobsQuota = 0 cumulativeSrchJobsQuota = 0 cumulativeRTSrchJobsQuota = 0

emiller42 · ‎11-24-2015

We're running a Search Head Cluster on Splunk 6.3.0. We have noticed that saved searches/alerts for some users stop dispatching seemingly at random. Issuing a rolling-restart on the cluster gets them working again, but eventually they stop.

emiller42 · ‎11-19-2015

Ahh, the chained eval thing is new in 6.3. If you're still on 6.2 you need to eval each separately. Apologies for the confusion there! EDIT: Whoops, I had a typo too. Should be | eval start=if(searchmatch("Submitted order"), 1, 0), end=if(searchmatch("Murex - Received ExecutionReport"), 1, 0) I've corrected it above. If this did work, can you mark it as accepted so others can find this? Thanks!

emiller42 · ‎11-19-2015

Try a non-transaction approach: sourcetype="enable-integration" ("Submitted order" OR "Murex - Received ExecutionReport") | rex "(?i).*?[^=](?P<TAG198>198=[^\x01]*)" | eval start=if(searchmatch("Submitted order"), 1, 0), end=if(searchmatch("Murex - Received ExecutionReport"), 1, 0) | stats earliest(_time) as _time latest(_time) as last_seen sum(start) as started sum(end) as ended by TAG198 | eval time_since_last_seen=now()-last_seen | search ended=0 AND time_since_last_seen>10 Explanation: Search for relevant log events. Extract your TAG198 field Create fields to indicate if the event is a starting or ending event. Useful later. Do a stats for each TAG198, grabbing the earliest and latest timestamps for events, and a sum of the start/end flags. Items which are ended will have a sum(end) > 0 This gives you an easy way to filter for open items. generate a field to show the time difference between when the search is run, and the most recent timestamp of the item filter for items which have not ended, and which have existed for more than 10 seconds. (Adjust as needed) This gives you a means to say "Show me open transactions which started more than X seconds ago" which sounds like what you're looking for. Worth noting: When trying to deal with near-realtime, you're going to start bumping into the potential of false positives caused by the fact that it takes time for data to end up in Splunk. There is a delay between log write > forwarder read > forwarder send > indexer receive > indexer write. Realtime searches insert themselves as far up the pipeline as possible (They're actually injected in the indexing pipeline before most of the parsing) but there's still a non-zero delta between log write > visible search result. When dealing with this stuff, think about how quickly you can realistically action a problem. Getting alerted in realtime is awesome, but pointless if it takes you a couple of minutes to respond to the alert. At that point, being a minute or two behind on detection doesn't hurt your responsiveness, and makes it much easier omit false positives due to timing. This all depends on your actual use case and expectations, so I won't tell you that you don't need realtime. But it is something to really consider.

Posts	275
Solutions	58
Karma Given	102
Karma Received	331
Member Since	‎08-11-2011

Online Status	Offline
Date Last Visited	‎06-05-2020 02:03 AM

Why does my Search Head Cluster Captain start dele...

Why do scheduled searches randomly stop running in...

Changes to forwarder->indexer connectivity in 6.3....

Why does the Splunk Add-on for Cisco UCS not respe...

How to disable the schedule_rtsearch capability?

Fire Brigade Install on Distributed Environment

Summary Index -nolocal

Problem using multikv to parse tabular data

streamstats is reversed?

Inputs.conf - wildcard monitor stanzas on Windows

Re: Why are AWS ELB Health Checks not working prop...

Re: Why are AWS ELB Health Checks not working prop...

Re: Why are AWS ELB Health Checks not working prop...

Re: Why are AWS ELB Health Checks not working prop...

Re: Why are AWS ELB Health Checks not working prop...

Re: Why are AWS ELB Health Checks not working prop...

Re: Why is the license usage much larger than the ...

Re: Why is the license usage much larger than the ...

Re: Part 1: How to extract a json portion of an ev...

Re: EBS backed storage vs local storage with index...

Re: What does Splunk do when one index in an index...

Re: Using stats with a by on 2 fields works, but h...

Re: Using stats with a by on 2 fields works, but h...

Re: Why is Splunk log line breaking not working as...

Re: After migration from our standalone search hea...

Re: After migration from our standalone search hea...

Re: How to correct timestamp recognition that is c...

Re: How to correct timestamp recognition that is c...

Re: Why do scheduled searches randomly stop runnin...

Re: Why does my Search Head Cluster Captain start ...

Why does my Search Head Cluster Captain start dele...

Re: Why do scheduled searches randomly stop runnin...

Why do scheduled searches randomly stop running in...

Re: How do I create an alert to trigger when a tra...

Re: How do I create an alert to trigger when a tra...

Join the Conversation