I have a very strange issue and I'm trying to solve it in the last week with no luck.
I have an alert created from a percentage error rate of my server.
The search is simple:
host=*-prod* source="/var/log/xxx" | stats count as All, count(eval(level="ERROR")) as ERROR | eval Alert=((ERROR/All)*100) | table Alert | where Alert >= 0.3
But, the alert is triggering false positive all the time.
Now, what strange is that when I push the "View Results" from email I see the false result (e.g 0.4), but when I just search the same search (just hit enter in the search field right after) I then get the right result (e.g 0.1).
These are my alert configurations:
Is this a bug? Am I doing something wrong?
Your alerts would be looking at a particular time range. Try to run the search manually for exactly same time range with hardcoded dates. Like if your alert ran at 4/19/2017 1:00 PM looking last 60 minutes, run the search for fixed time range of 4/19/2017 12:00 PM to 4/19/2017 1:00 PM and compare result.
The alert and the manual search both look at the last 2 hours.
As I said, I take the exact search that I get from the "View Results" and run it again. Same query, same time range. Different results.
You get a summary line just below the search text box in the format "nnnnn events (xxxxstartdatexxx to yyyyenddateyyyy)" . Check if they both are exactly same. Since you're using relative time, it may change (last 2hrs now will be different if you run it after 5 min from now).
If that is same, there may be some more events being ingested (success events) between the alert schedule and you manually running causing Error percent to go down. I would first eliminate timerange mismatch and they troubleshoot further.
When I get from the "view results" it says only 1 result. When I do the same search again it gives me millions.
How can it be?
The clicking of "view result" result load the result from dispatch directory which it finds 1 events (output of stats). The re-run of the search scans all relative events from base search hence that higher count. Do the time range matches when you click the "view results" versus running search again?
Update: I found that the problem is that I used a real time schedule. When I changed it to cron schedule (every min) everything started working fine.
@nmayafit - Glad to hear you found the solution to your question. Please don't forget to click "Accept" to resolve your post so that others can easily find it. Thanks!
I am facing same issue and with many alerts. But I don't have any of them set to Real Time. All are set to cron only.