About FeatureCreeep

FeatureCreeep · ‎08-13-2024

Exactly what I needed! Thanks!

FeatureCreeep · ‎08-12-2024

I have an alert that can clear in the same minute that it originally fired. When the correlation search runs, both events are in it, the alert and the clearing alert. The correlation search creates notable events for each but uses the current time for the _time for the notable events and not the _time from the original alerts. Since both alerts are converted into notable events during the same correlation search run, they get the exact same timestamp. This causes ITSI to not definitely know the correct order of the events and it sometimes thinks the Normal/Clear event came BEFORE the original alert. This seems odd to me. I would have imagined that ITSI would use the original event time as the _time for the notable event but it doesn't. Any ideas on how to address?

FeatureCreeep · ‎06-24-2024

I believe this was a misunderstanding on my part on how the episode views work. The "Events Timeline" screen looks like I would expect, with one alert and the timeline shows it was red, then moved to green. The "All Events" view appears to be a running list of all events that drive state changes.

FeatureCreeep · ‎06-21-2024

I'm trying to understand how to update the severity of a notable event when a new event arrives with a normal severity. I'm feeding external alerts into ITSI and a correlation search turns it into a notable event. I'm using a specific ID for the "Notable Event Identifier Fields". These alerts correctly turn into notable events and placed into an episode. When the same alert comes into ITSI, but with a "Normal"\2 severity, I expect it to change the severity of the prior notable event in the episode. Instead, it will treat it like a new notable event and put it into the same episode. I thought ITSI uses the Notable Event Identifier Fields to determine if two events are the same or not. I checked that both the original event and the "clearing" event have the exact same event_identifier_hash, so why does ITSI treat it like an additional alert\event in the episode? Instead of having one normal\clear event in the episode, I now have one critical and one normal. How are you supposed to update the status of an alert\notable event in an episode when a clearing event is received?

FeatureCreeep · ‎06-18-2024

I have a scheduled search/alert. It validates that for every Splunk event of type A, there is a type B. If it doesn't see a corresponding B, it will alert. Occasionally I am getting false alerts because Splunk is not able to reach one or more indexers. I'll see the message "The following error(s) occurred while the search ran. Therefore, search results might be incomplete. " along with additional details. That means the search doesn't get back all the events, which will include a type B event and cause a false alert to fire. Since Splunk knows it wasn't able to communicate to all the indexers, I'd like to abort the search. Is there anything sort of like the "addinfo" command were I can add information about whether getting all the data was successful so that I can do a where clause on it and remove all my rows if there were errors? How can I prevent an alert from firing if I didn't get all the results back from the indexers?

FeatureCreeep · ‎07-19-2018

We spoke with AWS support reps and they confirmed that they were aware of this issue and that plan to deploy a fix the week of the 23rd.

FeatureCreeep · ‎07-13-2018

@tophercullen Good explanation. I've found the exact same behavior. It seems like the add-on is doing the right thing by only looking at logs if CloudWatch says there is new data in it. This keeps API calls to CloudWatch to a minimum which is important since CloudWatch will throttle calls if you make too many and the limit is easy to hit. It seems to me that the problem is that CloudWatch is incorrectly reporting the last event time. The add-on is just believing what CloudWatch is telling it and thinks there are no new logs so no reason to ask for them. Seems like a bug report needs to be made for CloudWatch due to it's incorrectly reported "Last Event Time".

FeatureCreeep · ‎07-09-2018

I'm very new to AWS and am setting this up for the first time. I have a "CloudWatch" input for my metrics and a "CloudWatch Logs" input for my logs. The metrics feed works fine. The only problem is that the data that is stored in Splunk from the CloudWatch Logs feed is only the last message in each log stream. Since I'm getting data, I know most of my settings are correct but something isn't right. For reference, I'm using 60 second interval. Ideas?

FeatureCreeep · ‎03-19-2018

There are different options based on what your data looks like. If you really only have 2 hosts then you can do something simple like this. You can create a new field called silo and then set it to the correct value based on which host the event is from. host=Host1 OR host=Host2 | eval silo=case(host="Silo1Critera", "Silo1", host="Silo2Critera", "Silo2") | stats count, avg(time_taken) by cs_uri_stem, silo

FeatureCreeep · ‎03-19-2018

I'm not having an issue with data gaps that makecontinuous can address but rather an issue that bin is snapping bins to even 5 minute increments. If I run with a 5 minute span against the last 5 minutes, I want 1 bucket with all of my data in it. Instead, if I ran it at 5:07:30, I'll get a 5:00-5:05 bin with half my data and a 5:05-5:10 bin with the other half of my data. The same thing happens if you have 5 minute spans across 60 minutes of data. The first and last bins will have incomplete sample size. Separately, I did try makecontinuous based on your suggestion but I couldn't make it work. My understanding is that it is a replacement for bin. However, when I tried creating 5 minute spans, it created 1 second spans. I tried running against 5 and 60 minutes but it created 1 second spans both times. Am I doing something wrong here? host=myWebServers* index=iis | makecontinuous _time span=5m | stats count BY _time, host

FeatureCreeep · ‎03-18-2018

This should get you what you want: | rex "\"GET (?P<url>\/.*?[\/ ])" | eval url=trim(url) This will match in the case of an additional / and in the case where there isn't a second /. If there is no / then there will be a trailing space in the url so I added a trim to remove it. A fancier regex could probably remove the need for the trim but this works. I'm a little confused about what you want to do with POSTs. In your example above, you still parsed POSTs but maybe that was just an oversight. I would suggest filtering them out so you are only processing events with ""GET " in the event. If you don't filter them out then the "url" field will be NULL since the regex will not match.

FeatureCreeep · ‎03-18-2018

Have you checked your job in the Activities view? That will tell you what time the job executed and how long it took to complete. One possibility, There are alert options that allow your query to delay when it is ran if the Splunk cluster is busy so maybe you selected one of those options? Another possibility is that the query just took that long to complete. I'm guessing that isn't the case but looking at the job from the Activities view should help you narrow down the possible causes.

FeatureCreeep · ‎03-18-2018

You can easily enhance the above answer to make it an alert. I'd probably just add a where clause and some threshold. In the alert configuration, just alert if the query returns more than 0 rows, which I think is the default. index=_internal sourcetype=splunkd component=LicenseUsage type=Usage | bucket span=1h _time | stats sum(b) as usage by _time h | rename h as host | eval usageGB=round(usage/1024/1024/1024,3) | table _time host usageGB | where usageGB >= 20 You can set your alert to run hourly and check the last hour. This will only return a row if a host is using 20 or more GB an hour. If this query does return a row, the alert will fire. If you really do need the alert to only detect spikes and not a fixed threshold, that is very doable but you probably want to create a separate question for that. I think your original question was answered.

FeatureCreeep · ‎03-18-2018

Make sure when you add a new csv, you actually add ".csv" to the end of the file name. It won't work right if you don't.

FeatureCreeep · ‎03-18-2018

This should be a straight forward regular expression extract. Some of the formatting of your JSON looks strange though, with strange characters in it like where the "∂" where I think you expected "partnerEBID" to be. I also don't see a branchCode but I do see a brandCode. I'm not a JSON expert but that value for the brandCode doesn't look like valid JSON to me. If the value is supposed to be an empty string, it should be 2 sets of double quotes, not just 1. Anyway, with the data provided, I created a very simple regular expression. I'm sure there are better ways to write it but this is the easiest thing that works. I think you can extrapolate from here to extract more fields with this expression. | rex field=_raw "businessId=(?P<businessId>\d+).*EBID=(?P<partnerEBID>\d+)"

FeatureCreeep · ‎03-06-2018

I found a fairly straight forward solution. Since I want time buckets that don't snap to fixed time frames, I implemented by own binning logic. It's pretty compact and you can just use it to replace a bin command. The new logic makes bins of whatever size you want but works backwards from the moment your upper search time frame. If you run it with 10 minute spans at 5:25:35.112 then your last bin will be all the events from 5:15:35.112 to 5:25:35.112. With Splunk's default bin logic, your last bin would be from 5:20 to 5:25:35.112, which isn't the 10 minute span you asked for. You just have to make sure the time range you selected is evenly divisible by your time span. I haven't tested the effect of having a 30 min span on data that only has a 45 minute time range. Solution: Replace your bin command that might look like this: bin span=5m _time With this: addinfo | eval _time=info_max_time-(ceil((info_max_time-_time)/300))*300 Replace the 300s with your desired span in seconds. In this case, 300 seconds is the same as the original 5 min span.

FeatureCreeep · ‎02-28-2018

This sounds more complicated than it really is. At the root, I want to know if web traffic is balanced across my web farm. This query accomplishes that by finding out the average traffic per server in the farm (my sub search) and then calculate how much each server varies from that average. It works pretty well. host=myservers index=iis | stats count AS server_value BY host | join [search host=myservers index=iis | stats count as LatestValue by host | stats mean(LatestValue) as client_mean] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0) My challenge is that I want to use this metric in Splunk ITSI. In ITSI your metrics have to have a _time associated with your events. This is easy enough to do by creating bins, doing stats by time, and joining on the _time. host=myservers index=iis | eval host=upper(host) | bin span=5m _time | stats count AS server_value BY _time, host | join _time [search host=myservers index=iis | eval host=upper(host) | bin span=5m _time | stats count as LatestValue by _time, host | stats mean(LatestValue) as client_mean by _time] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0) This technically works but I'm running up against a classic "bin" issue. Bin always spans the bin to the whole time segments. ITSI runs metrics at 5-minute intervals by default and that makes sense. If I don't put a span, it will default to 5-second spans. That is too small of a sample time and the variances wildly differ. The 5-minute span above works well if you have a perfect 5-minute interval but you typically get some partial 5-minute block. Whatever time span I use I still typically have a partial bin that messes up the accuracy of my metric. Any ideas for solutions or alternative approaches?

FeatureCreeep · ‎11-03-2017

Perfect! This information should be in the "bin" documentation. Your workaround strategy worked great though a few min and maxes were backwards that I corrected. Thanks! Here is the updated workaround: host=server* index=iis | addinfo| eval _time=if(_time<relative_time(info_max_time,"-5m@m"),relative_time(info_min_time,"@m"),relative_time(info_max_time,"-5m@m")) | stats count by _time, host

FeatureCreeep · ‎11-03-2017

It won't create just 2 bins. Notice that the query goes back 10 minutes and the span is 5 minutes, but I still get 3 buckets. It's the same if I use "bins=2" This query: host=server* index=iis | bin span=5min _time | stats count by _time, host Run for this time: (11/3/17 1:37:17.000 PM to 11/3/17 1:47:17.000 PM) Returns these results: _time host count 2017-11-03 13:35:00 server01 7339 2017-11-03 13:40:00 server01 12910 2017-11-03 13:45:00 server01 6432 2017-11-03 13:35:00 server02 7402 2017-11-03 13:40:00 server02 14509 2017-11-03 13:45:00 server02 6167 2017-11-03 13:35:00 server03 7034 2017-11-03 13:40:00 server03 13665 2017-11-03 13:45:00 server03 6273

FeatureCreeep · ‎11-03-2017

I have the same problem that is unanswered here I'm trying to do stats on the last 10 minutes of data by two separate 5 minute buckets. My command looks like this: host=servers* index=iis | bin _time bins=2 | stats count as Request by _time, host When I run this for the last 10 minutes I expect 2 separate 5 minute buckets. If my query starts on an even 5 minute interval like 1:05-1:15, it works fine but the buckets expect even 5 minute increments and if I run 1:02-1:12, it will create a "1:00" bucket for the requests between 1:00 and 1:05, a "1:05" bucket for the 1:05-1:10 requests, and a "1:10" bucket for the 1:10-1:15 requests. That is 3 buckets of different sizes. Using the span=5 minutes doesn't help either. I tried just using the epoch number like: eval Time=_time | bin Time bins=2 I was trying to stop Splunk from treating the _time field specially but it put everything into a single bin. How can I get Splunk to just create 2 evenly sized/spanned buckets?

FeatureCreeep · ‎08-07-2017

I'm marking this answer correct because it does explain why my "simple" example of my problem wasn't working. It turns out the my original problem was due to hidden special characters in the data string that I couldn't see unless I copied and pasted the string into an app that would display those characters. I had to create a regex in sed mode to strip out all the special characters and my original format string worked. Thanks

FeatureCreeep · ‎08-03-2017

Ya, @somesoni2 pointed out that my "simple" example is too simple. My original issue was with a datetime like "2017‎-‎08‎-‎03T07:43:17.125751900". I was using "%Y-%m-%dT%H:%M:%S.%9Q" as the format string.

FeatureCreeep · ‎08-03-2017

This is driving me nuts because I use strptime all the time and have many of my own working examples to reference. I was having a problem doing strptime with a more complex date that wasn't working so I kept making it more simple until even this isn't working. ... | eval TestYear="2017" | eval TestResult=strptime(TestYear,"%Y") | table TestYear, TestResult Why isn't TestResult getting the epoch time for the year? The field is not being created. This is so simple that I'm clearly doing something dumb that I'm just too close to see. Thanks guys

FeatureCreeep · ‎03-19-2017

Thanks. Worked like a charm. I guess I need to spend more time reading up on stats vs chart.

FeatureCreeep · ‎03-19-2017

Posts	27
Solutions	3
Karma Given	4
Karma Received	2
Member Since	‎03-10-2017

Online Status	Offline
Date Last Visited	‎08-13-2024 02:22 PM

Normal/Clearing event getting same timestamp as or...

How to change severity of an ITSI notable event wh...

How to remove results if indexer errors occurred

Splunk addon for AWS: Only last message in log str...

How to measure web traffic variance between server...

Why is bin command creating too many bins? Issue w...

Issue with epoch time when using strptime() functi...

How to create dynamic columns based on calculated ...

How to have a panel use an offset from a time pick...

Re: Normal/Clearing event getting same timestamp a...

Normal/Clearing event getting same timestamp as or...

Re: How to change severity of an ITSI notable even...

How to change severity of an ITSI notable event wh...

How to remove results if indexer errors occurred

Re: Lambda Cloudwatch logs often missing due to ed...

Re: Lambda Cloudwatch logs often missing due to ed...

Splunk addon for AWS: Only last message in log str...

Re: alert based on different keywords/sources but ...

Re: How to measure web traffic variance between se...

Re: How to extract a field from a GET request?

Re: Splunk Alert Triggers after 15 minutes of its...

Re: I need a query that shows how much data is bei...

Re: Lookup Editor

Re: How to extract sub-element data from a JSON me...

Re: How to measure web traffic variance between se...

How to measure web traffic variance between server...

Re: Why is bin command creating too many bins? Iss...

Re: Why is bin command creating too many bins? Iss...

Why is bin command creating too many bins? Issue w...

Re: Issue with epoch time when using strptime() fu...

Re: Issue with epoch time when using strptime() fu...

Issue with epoch time when using strptime() functi...

Re: How to create dynamic columns based on calcula...

How to create dynamic columns based on calculated ...

Join the Conversation