Alerting

Why is my alert triggering false positives?

nmayafit
Path Finder

Hi,

I have a very strange issue and I'm trying to solve it in the last week with no luck.
I have an alert created from a percentage error rate of my server.
The search is simple:

host=*-prod* source="/var/log/xxx"  | stats count as All, count(eval(level="ERROR")) as ERROR | eval Alert=((ERROR/All)*100) | table Alert | where Alert >= 0.3

But, the alert is triggering false positive all the time.
Now, what strange is that when I push the "View Results" from email I see the false result (e.g 0.4), but when I just search the same search (just hit enter in the search field right after) I then get the right result (e.g 0.1).

These are my alert configurations:
alt text

Is this a bug? Am I doing something wrong?

0 Karma
1 Solution

nmayafit
Path Finder

Update: I found that the problem is that I used a real time schedule. When I changed it to cron schedule (every min) everything started working fine.

View solution in original post

0 Karma

bishtk
Communicator

Hi Techies,

I am facing same issue and with many alerts. But I don't have any of them set to Real Time. All are set to cron only.

nmayafit
Path Finder

Update: I found that the problem is that I used a real time schedule. When I changed it to cron schedule (every min) everything started working fine.

0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

@nmayafit - Glad to hear you found the solution to your question. Please don't forget to click "Accept" to resolve your post so that others can easily find it. Thanks!

0 Karma

nmayafit
Path Finder

The alert and the manual search both look at the last 2 hours.
As I said, I take the exact search that I get from the "View Results" and run it again. Same query, same time range. Different results.

0 Karma

somesoni2
Revered Legend

You get a summary line just below the search text box in the format "nnnnn events (xxxxstart_datexxx to yyyyend_dateyyyy)" . Check if they both are exactly same. Since you're using relative time, it may change (last 2hrs now will be different if you run it after 5 min from now).
If that is same, there may be some more events being ingested (success events) between the alert schedule and you manually running causing Error percent to go down. I would first eliminate timerange mismatch and they troubleshoot further.

0 Karma

nmayafit
Path Finder

When I get from the "view results" it says only 1 result. When I do the same search again it gives me millions.
How can it be?

0 Karma

somesoni2
Revered Legend

The clicking of "view result" result load the result from dispatch directory which it finds 1 events (output of stats). The re-run of the search scans all relative events from base search hence that higher count. Do the time range matches when you click the "view results" versus running search again?

0 Karma

nmayafit
Path Finder

I see the same time range on both.

0 Karma

somesoni2
Revered Legend

Your alerts would be looking at a particular time range. Try to run the search manually for exactly same time range with hardcoded dates. Like if your alert ran at 4/19/2017 1:00 PM looking last 60 minutes, run the search for fixed time range of 4/19/2017 12:00 PM to 4/19/2017 1:00 PM and compare result.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...