Splunk Enterprise

How to fix this error: The search job "SID" was canceled remotely or expired?

sylim_splunk
Splunk Employee
Splunk Employee

 We have a requirement to pull security logs for past specific the time ranges -  i.e from December 2022 - Apr 2023, Splunk cannot complete a search without expiring for even a 1 hour window in December.
  This fails our published 12 month retention period for these logs.  Please provide options for how to Identify, correct, or Improve this Search challenge.

 

 

 

The search job 'SID' was canceled remotely or expired. 

 

 

 

Sometimes the GUI shows "Unknown SID".  The version currently used is 8.2.9.

Labels (1)
Tags (1)
0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...