Alerting

How does Splunk assign thread_id for scheduled searches and alerts in scheduler.log?

AntonyPriwin
Explorer

Hi All,

Need some info regarding thread_id in scheduler.log and how it is being assigned.

Sample Events 1:

02-03-2016 08:40:01.341 +0000 INFO SavedSplunker - savedsearch_id="admin;search;SS_1minute_23", user="admin", app="search", savedsearch_name="SS_1minute_23", status=success, digest_mode=1, scheduled_time=1454488800, dispatch_time=1454488801, run_time=0.178, result_count=1, alert_actions="", sid="scheduler_adminsearch_RMD5fe320a1798d45e4e_at_1454488800_189", suppressed=0, thread_id="AlertNotifierWorker-1"

02-03-2016 08:40:01.340 +0000 INFO SavedSplunker - savedsearch_id="admin;search;SS_test_threadid_3", user="admin", app="search", savedsearch_name="SS_test_threadid_3", status=success, digest_mode=1, scheduled_time=1454488800, dispatch_time=1454488801, run_time=0.156, result_count=1, alert_actions="", sid="scheduler_adminsearch_RMD506c18cc48c92c389_at_1454488800_190", suppressed=0, thread_id="AlertNotifierWorker-0"

Sample Events 2:

02-03-2016 08:59:01.219 +0000 INFO SavedSplunker - savedsearch_id="admin;search;SS_1minute_23", user="admin", app="search", savedsearch_name="SS_1minute_23", status=success, digest_mode=1, scheduled_time=1454489940, dispatch_time=1454489941, run_time=0.089, result_count=1, alert_actions="", sid="scheduler_adminsearch_RMD5fe320a1798d45e4e_at_1454489940_239", suppressed=0, thread_id="AlertNotifierWorker-0"

02-03-2016 08:59:01.211 +0000 INFO SavedSplunker - savedsearch_id="admin;search;SS_23", user="admin", app="search", savedsearch_name="SS_23", status=success, digest_mode=1, scheduled_time=1454489940, dispatch_time=1454489941, run_time=0.087, result_count=1, alert_actions="", sid="scheduler_adminsearch_RMD510bfa07112c26c31_at_1454489940_238", suppressed=0, thread_id="AlertNotifierWorker-0"

There are 14 thread_ids in the name of Alertnotifierworker-0 , Alertnotifierworker-1, Alertnotifierworker-2 … Alertnotifierworker-13.

We have seen scenarios like the scheduled_time and dispatch_time of a savedsearch/alert is the same, thread_id is getting incremented.
(E.g: Alertnotifierworker-0, Alertnotifierworker-1, Alertnotifierworker-2 … Alertnotifierworker-13)

However, in some cases, we have only one thread_id for all these savedsearch/alerts with same scheduled_time (E.g: Alertnotifierworker-0)

When would Splunk assign same thread_id and different thread_id for scheduled searches/alerts?

0 Karma
1 Solution

jrodman
Splunk Employee
Splunk Employee

The thread_id reported in scheduler.log simply reports the name of the thread which is being used to possibly send out an alert inside main splunkd as outcome of the search.

The threads are part of a thread-pool which are essentially all identical worker threads to do the work of sending emails, running scripts, etc. The pool may become larger in an environment where there is a larger amount of overlap in alert work as the result of scheduled searches. The IDs of the threads in such a pool simply increment from 0 as they come to exist, and are going to be assigned on a first-come-first-serve basis when work needs to be done.

Essentially this value has no user meaning. It was probably added to the product as an aid to troubleshooting & debugging in case of defect scenarios.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

The thread_id reported in scheduler.log simply reports the name of the thread which is being used to possibly send out an alert inside main splunkd as outcome of the search.

The threads are part of a thread-pool which are essentially all identical worker threads to do the work of sending emails, running scripts, etc. The pool may become larger in an environment where there is a larger amount of overlap in alert work as the result of scheduled searches. The IDs of the threads in such a pool simply increment from 0 as they come to exist, and are going to be assigned on a first-come-first-serve basis when work needs to be done.

Essentially this value has no user meaning. It was probably added to the product as an aid to troubleshooting & debugging in case of defect scenarios.

AntonyPriwin
Explorer

@jrodman Thanks for your response.
Our requirement is to identify the number of concurrent scheduled searches/alerts ran at a given point of time.
We were in an opinion to count them based on the AlertNotifierWorker thread, however it seems of no help.
Please let us know how do we approach for the same.

0 Karma

jrodman
Splunk Employee
Splunk Employee

If anything, this represents the work at the end of an alert to decide if it's time to fire or actually emit the alert actions. It doesn't correlate with the running searches.

The concurrent workload of scheduled searches (both those you might consider alerts and otherwise) should be available in a accessible form within the Splunk Distributed Management Console, which uses the server/status/resource-usage/splunk-processes endpoint as its source information for live data (as accessed via |rest) and uses the introspection data for historical concurrency information.

Specifically, it digs into the data in the introspection index, from the resource_usage.log or sourcetype=-splunk_resource_usage data for component=PerProcess where data.search_props.type=scheduled.

Theoretically you could build a picture from scheduler.log, but you'd have to compute overlaps based on the dispatch_time and run_time of each alert, and this is pretty ungainly.

If you wouldn't mind turning this followup into a specific question -- how can we review the concurrent search load of our scheduled searches? -- I think it's a far more common goal and I don't see a clear question asked along these lines.

Keep in mind, of course, that the apportionment algorithm of the scheduler means that the concurrency of scheduled searches might drop in times of high contention with searches launched either via ad-hoc user behavior, or dashboard loads. (The Splunk search quota and apportionment machinery essentially considers searches stored in dashboards or invoked on-load by dashboard loads to be equivalent to user-typed searches.)

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...