We have configured our monitoring tools to have Network and Application alert events to be sent as SNMP traps. Splunk monitors /var/log/snmp-traps.log file, parses data and indexes them, no problem there.
All necessary fields for "Correlation Search" are present (severity, title, etc), "Notable Events" are created by the ad-hoc correlation searches, searches are run for 1 minute window, also there is no problem here.
However breaking rules are not working as expected, for example there are multiple "Episodes" for same events are starting with exact same starting event, they may break prematurely and end up having more than one Episodes for the same starting event. We also observed that there are some Episodes getting just one event and never getting closed. We have experimented with almost every combination in "Aggregation Policies"
What is going on here? Why does it get confused, I know that is hard to understand without looking actual settings and configuration but I did my best to understand documents and setting up the whole policy. Did anyone else here had this issue?
You are getting multiple episodes because once you get an up or a clear, it breaks and no more NE's can enter that episode. If you are getting a lot of episodes you may consider using a KPI instead and only create alerts when the KPI turns red.
Few things to check: Are the episodes truly duplicate (same exact number of events, same events, same timestamp)?
Check in your agg policy breaking rules - what is in there? Make sure that it's not set to 'break for event that is normal or when flow of events is paused for - if a normal event comes in or it hits that time limit, it will break the episode and start anew. Also is there more than one agg policy that the trap would match when it comes in? NE's can make it into multiple episodes IF they match more than one agg policy filter. For instance, if my filter just says: Severity >= Normal and snmp_name =* and then I have another agg policy with a similar filter but maybe just the Severity >= Normal, the trap will match two agg policies and end up in both. Can you post a screenshot of your breaking rules and your ACTION tab?
Images are no longer here
There are no entries on Action tab.
Splunk is at 8.0.1
ITSI is at 4.4.1 Build 10