Splunk Enterprise

How to fix this error: The search job "SID" was canceled remotely or expired?

sylim_splunk
Splunk Employee
Splunk Employee

 We have a requirement to pull security logs for past specific the time ranges -  i.e from December 2022 - Apr 2023, Splunk cannot complete a search without expiring for even a 1 hour window in December.
  This fails our published 12 month retention period for these logs.  Please provide options for how to Identify, correct, or Improve this Search challenge.

 

 

 

The search job 'SID' was canceled remotely or expired. 

 

 

 

Sometimes the GUI shows "Unknown SID".  The version currently used is 8.2.9.

Labels (1)
Tags (1)
0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. 

Here's the logs:

1. auto-cancel logged in to search_messages.log ;

 

07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled

 

 

2.  splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
   There's a 1min gap between the last success of 200 and the error 404 in the graph below.

   The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.

 

   index=_internal source=*splunkd_ui_access.log  "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart  span=1s count by status

 

Screenshot 2023-07-20 at 5.50.48 PM.png

Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.

Screenshot 2023-07-20 at 5.57.11 PM.png

3. Recommendation: 

 

i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62

- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.

ii) In limits.conf
[search]
min_settings_period = 60

 

 

 

0 Karma
Get Updates on the Splunk Community!

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...

Combine Multiline Logs into a Single Event with SOCK: a Step-by-Step Guide for ...

Combine multiline logs into a single event with SOCK - a step-by-step guide for newbies Olga Malita The ...