Splunk ITSI

100% Skip Ratio - ITSI event Grouping

Jarohnimo
Builder

I have a 3 node setup, 1 indexer and 2 search heads. one search head has ITSI (ITSI is also on the indexer). The index is the one that's giving me all these skipped ratios Under the User Management Console, when we click on "Schedule Activity" it shows SA-ITOA 241 executions and that it's 100% skipped.

Now at one Point I had ITSI exclusively on this particuar box as it was our stand alone server. I have since migrated ITSI to a new box and only have ITSI on the indexer now for functional purposes. So perhaps ITSI has some running processes that it doesn't need?

It says its reason: The maximum number of concurrent searches has been reached (241).

I'm very weak in my understanding of Skip Ratio and why, as i didn't manually setup any searches. Why it's failing now is beyond me, I'd imagine there some maintenance that I'm suppose to be doing that I'm not. I have read some of the splunk documentation around this but i haven't grasped the understanding. perhaps someone can break this down for me.

my goal is to eliminate all unneeded/ unused scheduled reports, alerts etc... I'm tripping all over the skip ratio and i'm not sure what I'm getting out of it.

Tags (3)
0 Karma
1 Solution

DalJeanis
Legend

The general meaning of "skipped" is that either a scheduled search has not finished before the time that it needs to run again, or that the box has no juice left to run another search.

Before you do any of the below, look for real time searches and kill them all. Then hunt down the users who launched the real-time searches and ... and ... well, just maim them a little. If that straightens up your problems, then you've learned who needs to be watched.

Assuming there were none of those, then we go on to the normal debug for this...

First, check to see how long the skipped searches are taking to run. If they are unable to run in the time allowed, but the boxes are not under general stress, then check to see how the searches can be made more efficient . Come back and do this for all long-running searches, but only after the next step.

Second, look at the performance on the box and see whether it is cpu bound, io bound, memory bound, having a cigarette break, or what.

View solution in original post

0 Karma

DalJeanis
Legend

The general meaning of "skipped" is that either a scheduled search has not finished before the time that it needs to run again, or that the box has no juice left to run another search.

Before you do any of the below, look for real time searches and kill them all. Then hunt down the users who launched the real-time searches and ... and ... well, just maim them a little. If that straightens up your problems, then you've learned who needs to be watched.

Assuming there were none of those, then we go on to the normal debug for this...

First, check to see how long the skipped searches are taking to run. If they are unable to run in the time allowed, but the boxes are not under general stress, then check to see how the searches can be made more efficient . Come back and do this for all long-running searches, but only after the next step.

Second, look at the performance on the box and see whether it is cpu bound, io bound, memory bound, having a cigarette break, or what.

0 Karma

Jarohnimo
Builder

The same day I went to scheduled searches a killed the one search that was skipping. It was (in my eyes) unneeded but we will know Monday if things are all jacked up ... How much that search was really needed...if no issues then I'll turn them off on my search heads too. Notable events are something someone should consciously opt into given the resource strain.

In my case the resources were fine. Just that pesky warning in the health report ... The biggest issue is that certain alerts are getting skipped because of the notable events real-time search ..I rather have the alert fire off than a notable event indactor that requires me to login to Splunk to look at.. vs an SMTP alert that will notify me real time if other searches weren't getting in the way...

DalJeanis
Legend

Understood. Yes, realtime searches are often the killer, so it sounds like disabling one that you feel is unneeded is a good decision. It would be worth revisiting the decision occasionally to verify that your needs have not changed.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...