Splunk Search

Issue with ITSI service_health_monitor saved search

Nisha18789
Builder

Something weird started happening in our Splunk environment for ITSI native saved search : service_health_monitor

This search started getting 100% skipped with reason: The maximum number of concurrent running jobs for this historical scheduled search on this cluster has been reached

So, I checked the jobs section and found that the search was stuck running at x% < 100 and hence the next scheduled search could not start. So tried deleting that one, so that it can run in next run, but the next run showed the same behaviour ie, stuck halfway. 

Inspect job shows most of the time was spent on startup.handoff and below is what I can see in the end of savedsearch.log that after the noop process (BEGIN OPEN: Processor=noop) splunk seems stuck.

Please provide any insights which can help in investigating further.

 

09-07-2020 17:41:05.680 INFO  LocalCollector - Final required fields list = Message,_raw,_subsecond,_time,alert_level,alert_severity,app,index,indexed_is_service_max_severity_event,is_service_in_maintenance,itsi_kpi_id,itsi_service_id,kpi,kpiid,prestats_reserved_*,psrsvd_*,scoretype,service,serviceid,source,urgency
09-07-2020 17:41:05.680 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:05.680 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:05.680 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:05.680 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.171 INFO  ChunkedExternProcessor - Exiting custom search command after getinfo since we are in preview mode:gethealth
09-07-2020 17:41:06.177 INFO  SearchOrchestrator - Starting the status control thread.
09-07-2020 17:41:06.177 INFO  SearchOrchestrator - Starting phase=1
09-07-2020 17:41:06.177 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:06.177 INFO  ReducePhaseExecutor - Stating phase_1
09-07-2020 17:41:06.177 INFO  SearchStatusEnforcer - Enforcing disk quota = 26214400000
09-07-2020 17:41:06.177 INFO  PreviewExecutor - Preview Enforcing initialization done
09-07-2020 17:41:06.177 INFO  DispatchExecutor - BEGIN OPEN: Processor=stats
09-07-2020 17:41:06.209 INFO  ResultsCollationProcessor - Writing remote_event_providers.csv to disk
09-07-2020 17:41:06.209 INFO  DispatchExecutor - END OPEN: Processor=stats
09-07-2020 17:41:06.209 INFO  DispatchExecutor - BEGIN OPEN: Processor=gethealth
09-07-2020 17:41:06.217 INFO  DispatchExecutor - END OPEN: Processor=gethealth
09-07-2020 17:41:06.217 INFO  DispatchExecutor - BEGIN OPEN: Processor=noop
09-07-2020 17:48:07.948 INFO  ReducePhaseExecutor - ReducePhaseExecutor=1 action=PREVIEW

 

 

0 Karma

pagillar
Explorer

@Nisha18789 did you find the solution to the issue? we have the same issue and wondering if we can change the cron to run every 10 minutes instead of every one minute.

0 Karma

Nisha18789
Builder

hi @pagillar , yes we identified the issue was due to some of the services have unicode character in the service name, which was causing the service health monitor search skipping. We identified those services and renamed them which fixed the issue. 

Now, I am not sure if its easy to identify that in our environment or not. But here is how we identified it -

get list of all service name using rest api query or lookup and use any online unicodelookup tool to find the ones.

Hope this helps!

Tags (1)
0 Karma

pagillar
Explorer

@Nisha18789  Thanks

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...