Re: Issue with ITSI service_health_monitor saved s...

Nisha18789 · ‎09-07-2020

Something weird started happening in our Splunk environment for ITSI native saved search : service_health_monitor

This search started getting 100% skipped with reason: The maximum number of concurrent running jobs for this historical scheduled search on this cluster has been reached

So, I checked the jobs section and found that the search was stuck running at x% < 100 and hence the next scheduled search could not start. So tried deleting that one, so that it can run in next run, but the next run showed the same behaviour ie, stuck halfway.

Inspect job shows most of the time was spent on startup.handoff and below is what I can see in the end of savedsearch.log that after the noop process (BEGIN OPEN: Processor=noop) splunk seems stuck.

Please provide any insights which can help in investigating further.

09-07-2020 17:41:05.680 INFO  LocalCollector - Final required fields list = Message,_raw,_subsecond,_time,alert_level,alert_severity,app,index,indexed_is_service_max_severity_event,is_service_in_maintenance,itsi_kpi_id,itsi_service_id,kpi,kpiid,prestats_reserved_*,psrsvd_*,scoretype,service,serviceid,source,urgency
09-07-2020 17:41:05.680 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:05.680 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:05.680 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:05.680 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.105 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
09-07-2020 17:41:06.171 INFO  ChunkedExternProcessor - Exiting custom search command after getinfo since we are in preview mode:gethealth
09-07-2020 17:41:06.177 INFO  SearchOrchestrator - Starting the status control thread.
09-07-2020 17:41:06.177 INFO  SearchOrchestrator - Starting phase=1
09-07-2020 17:41:06.177 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Setting user context: splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:06.177 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
09-07-2020 17:41:06.177 INFO  ReducePhaseExecutor - Stating phase_1
09-07-2020 17:41:06.177 INFO  SearchStatusEnforcer - Enforcing disk quota = 26214400000
09-07-2020 17:41:06.177 INFO  PreviewExecutor - Preview Enforcing initialization done
09-07-2020 17:41:06.177 INFO  DispatchExecutor - BEGIN OPEN: Processor=stats
09-07-2020 17:41:06.209 INFO  ResultsCollationProcessor - Writing remote_event_providers.csv to disk
09-07-2020 17:41:06.209 INFO  DispatchExecutor - END OPEN: Processor=stats
09-07-2020 17:41:06.209 INFO  DispatchExecutor - BEGIN OPEN: Processor=gethealth
09-07-2020 17:41:06.217 INFO  DispatchExecutor - END OPEN: Processor=gethealth
09-07-2020 17:41:06.217 INFO  DispatchExecutor - BEGIN OPEN: Processor=noop
09-07-2020 17:48:07.948 INFO  ReducePhaseExecutor - ReducePhaseExecutor=1 action=PREVIEW

pagillar · ‎10-15-2021

@Nisha18789 did you find the solution to the issue? we have the same issue and wondering if we can change the cron to run every 10 minutes instead of every one minute.

Nisha18789 · ‎10-19-2021

hi @pagillar , yes we identified the issue was due to some of the services have unicode character in the service name, which was causing the service health monitor search skipping. We identified those services and renamed them which fixed the issue.

Now, I am not sure if its easy to identify that in our environment or not. But here is how we identified it -

get list of all service name using rest api query or lookup and use any online unicodelookup tool to find the ones.

Hope this helps!

pagillar · ‎10-19-2021

@Nisha18789 Thanks

Issue with ITSI service_health_monitor saved search

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Join the Conversation

Issue with ITSI service_health_monitor saved search

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?