We have a correlation search that produces a couple of thousand events every 5 minutes. At the same time we are seeing the "Skipped Events Percentage" in the "Event Analytics Monitoring" dashboard go to 100%. In addition, I see the KV stores for itsi_notable_group_user and itsi_notable_group_system hit 50,000 (which I subsequently upped in itsi_notable_event_retention.conf to 150000).
For some reason episodes are not reliably getting generated.
Two questions -
1. how do we troubleshoot the skipped events percentage issue, which presumably is causing the lack of episodes (though I can't seem to find documentation discussing how this all works).
2. Should we change our correlation search to not include normal severity events? Currently, the normal severity events are be produced so that we can change episodes to "info" when a "normal" event comes in. Welcome recommendations on a better practice than this!
So first thing to check is that the agg policies you have to group are not excluding events. Skipped events is relevant to grouped events only so when you see skipped events, are they events that are not being grouped by the agg policies. You can test this by making sure your default policy is grouping by 'source' only. Also can you tell me if you have tsix enabled? There is a known issue with large number of NE's being created if you have that enabled. Troubleshooting, turn off corr searches except for one, and one agg policy. See if events get grouped and skipped searches reduce. If so, then you need to analyze what your agg policy is doing that is excluding those events. Basically it's the process of elimination and remembering skipped events means, skipped grouping, not skipped creating NE's.