Hi guys I got a problem with the results from the triggered alerts and I really need your help!
I have some alerts in Splunk and I want to know which alert is triggered and Why. So what I have done is to find all triggered alerts and get their triggered time by:
index=_audit action="alert_fired" | table _time ss_name trigger_time
For example, alertA will be triggered when the number of Error events in indexB is over 0, and it runs once an hour. What I'm doing is to take alertA's trigger_time, eval start_time=trigger_time-60min (which is the time when alertA starting counting Error events), and then count how many Error events in indexB with the trigger_time as latest, the start_time as earliest. So if I see alertA is triggered, I should get the result over 0 with the search.
But it's not going very well. When I see a record of alertA, if I check its detail from Triggered Alerts Page and click "view results", it does shows the result over 0. However, if I put alertA's search line into search and set the same time period as alertA, sometimes it shows no results, by which alertA should not be triggered!
The first image is what I get by clicking "view results" in Triggered Alerts Page and the second image is what I get manually. See I just copied the search line and search time to the search and got totally different results! By the way in the first image, the "|" at the end of the search line is the cursor, not a real "|". Same to one of the "|" before "where" of the search line inf the second image.
So anyone know what's wrong with it? Any suggestion will be appreciated!
My previous post didn't completely resolve my issue, so I opened a ticket with Splunk support and got this resolved completely.
The problem was that the query in my Alert was "search index=myindex sourcetype=waf httpstatus=400".
As soon as I removed the keyword "search" from the beginning of this query in the alert, it produced results consistent with manually issuing the search (index=myindex sourcetype=waf httpstatus=400). The rationale behind this (if I understood the support engineer correctly) is that the Alert passes the query to the CLI (i.e. /bin/splunk search ), so the CLI interprets the "search" item in my query as a searchable word, not a function.
I had a similar problem with alerting and figure out what my problem was. Specifically, I was reporting on Incapsula WAF logs, and if I ran the alert manually (clicking on Run next to the alert in Settings > Searches, reports and alerts), I would get a different value than when I immediately clicked on the search icon (making no other changes).
The problem ended up being the application context the alert was set to run under. It was setup under the Search app. As soon as I Moved the alert to the Imperva_waf context, it worked correctly. FYI, this is under Settings > Searches, reports and alerts and appears to the right of your alert.
Even though I changed permissions on the Imperva_waf app to be global, my ID can run these regardless of which app I'm in, but clearly the Alert function needs to be placed in the same location where the props and transforms actually reside.
OK, given that the Search Head, user and app are all the same, the problem has to be either that the Indexers were different between one search and the other (meaning that one search had access to less data than the other because one or more of the indexers was inaccessible) or, if all the indexers were present for both searches, the amount of data inside them was different between the 2 searches. Let us assume the latter: how could this be? It is actually VERY common if you have significant pipeline latency (or bad time-configurations/interpretations) for your events, which is sadly all too common.
Let's say you are running this search every 6 minutes for the last 6 minutes up until now
. So your first search was run at 2:06AM and finished shortly thereafter and found 20 events. Because 20!=21
, the alert fired. But there was an event in the pipeline that hadn't made it all the way into Splunk that was indexed at 2:08AM and having happened at 2:05AM (3 minutes pipeline latency). Later, you double-checked the alert and ran the search manually and now that you can see the additional event, 21==21
so you don't see any results. The way to verify this is to calculate latency as _indextime - _time
for each event. I am sure you will find the last event in this search has a latency that puts it outside of noticeability of your alert because of the way you have constructed and scheduled it.
Which user is assigned to run the alert? Maybe there is a difference in the roles/permissions of the alert user and your user.
You are either running as a different user
(unlikely) or inside a different app
context (probably). This can easily be determined by comparing the URLs. It is probably that you a field extraction with "This app only" permissions that facilitates your search results and the former search is running inside that app but the latter one is not. You can ensure you are in the proper app context or your can expand your permission scope to "All apps" for the critical Knowledge Objects.
Now that we know that you are using 2 different users, your problem is probably that each user has a different Time zone
setting which means that even though each search is on 9/17/2015 2:00:06.000 AM
for 6 minutes, this is actually 2 different times as normalized to the user's Time zone
. Check this under Your Name
-> Edit account
-> Time zone
(make note of each user's Roles
here). Another thing that it could be is that these 2 users have different values for Restrict search terms
so check that, too under Settings
-> Access controls
-> Roles
.
Where can I check these things?
Answer updated.
Oops, sorry for my mistake. I have many alerts which are created by two users and I was using one of the users on searching. And the user who created the alert in the image was the same user who did the search (which is admin).
I don't understand. Are you backtracking from your earlier comment that each search is from a different user and you are now saying that both searches are from the same user?
Well I got many alerts which are created by userA and userB and I did the search with userA. The alert in the images is created by userA and I messed it up with the other alerts. Sorry about that.
I am totally confused. Do you even still have a problem?
Yes. As you can see in the images the alert belongs to admin and I searched the same search line and search time with admin then got different results. Now seems it's not caused by user issue. So what else can possibly cause this problem?
Thanks for your reply! I'm using a different user from the user who created the alerts. But the permission of all alerts is set as "display for app", and I'm sure I search it in the same app. And also as you can see in the first image, it shows the count 20. And if I search the search line in the second image without the "where" section, it shows the count 21 (which means it just doesn't meet the condition of triggering alert, not returning nothing). And all extracted fields' permission are set as either global or app. So is there anything wrong with my setting?
As I said, show us the URLs and that will help.
Images updated. Sorry I can't access to Splunk right now so I can't just copy the whole URLs.