Splunk Enterprise

How to fix this error: The search job "SID" was canceled remotely or expired?

sylim_splunk
Splunk Employee
Splunk Employee

 We have a requirement to pull security logs for past specific the time ranges -  i.e from December 2022 - Apr 2023, Splunk cannot complete a search without expiring for even a 1 hour window in December.
  This fails our published 12 month retention period for these logs.  Please provide options for how to Identify, correct, or Improve this Search challenge.

 

 

 

The search job 'SID' was canceled remotely or expired. 

 

 

 

Sometimes the GUI shows "Unknown SID".  The version currently used is 8.2.9.

Labels (1)
Tags (1)
0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...