Average execution lag increases over time

drussell88 · ‎02-13-2013

I am having an issue with the average execution lag increasing over a period of 24 hours. This is pushing off the time that my scheduled jobs are set to run. I have around 450 saved searches that run anywhere from every 7 minutes to every 4 hours. I have to restart the search head to alleviate the issue. The search head is a windows server with 4 Intel Xeon E7 processors...E7 has 8 cores giving the machine 32 cpu's. I do not see any issues with the cpu usage or memory. I have plenty of that. I was reading about making some modification to max_searches_per_cpu (mine is currently 4) and/or changing the max_searches_perc to a value greater than 25 in limits.conf. I am not sure exactly what I should do in my case. My limits.conf is access from default. I do not have a version in local currently. I also read that some jobs might be waiting on others to finish. How do I determine if this is happening?

yannK · ‎02-13-2013

I have around 450 saved searches that run anywhere from every 7 minutes to every 4 hours

This statement is likely the cause, if your searches overlap you will quickly reach the maximum number of concurrent searched for : the users quota, and for the system limits.

Install the SOS app and look at the scheduler dashboard, you will see when the scheduled searched starts to be skipped. And find the worse searches.

To resolve it, optimize your searches duration and their spread over the time.

drussell88 · ‎02-13-2013

I have one search head and one indexer available to me. They seem like the machines can handle the load. I was thinking it is a configuration issue. I have been looking for DEBUG settings and exclusion of virus scan.

yannK · ‎02-13-2013

if the core issue is slow searches, you need to consider the scaling of your cluster. See if loabalancing your data over more indexers will improve overall search speed.

drussell88 · ‎02-13-2013

There is a period of time where there i a large number of skipped searches, but it seems like all of them.

drussell88 · ‎02-13-2013

I do have the SOS app. The average running searches is below 5 over a 24 hour period, but the lag time creeps way up. There are period of time where there a large number of searches, but it seems like it is all of them.

Average execution lag increases over time

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Fuel Your Journey: What’s Waiting for You at the .conf26 Acceleration Station

Join the Final Session of the Data Management & Federation Bootcamp Series

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Join the Conversation

Average execution lag increases over time

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Fuel Your Journey: What’s Waiting for You at the .conf26 Acceleration Station

Join the Final Session of the Data Management & Federation Bootcamp Series

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest