We have a requirement to pull security logs for past specific the time ranges - i.e from December 2022 - Apr 2023, Splunk cannot complete a search without expiring for even a 1 hour window in December.
This fails our published 12 month retention period for these logs. Please provide options for how to Identify, correct, or Improve this Search challenge.
The search job 'SID' was canceled remotely or expired.
Sometimes the GUI shows "Unknown SID". The version currently used is 8.2.9.
The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out.
Here's the logs:
1. auto-cancel logged in to search_messages.log ;
07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
2. splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
There's a 1min gap between the last success of 200 and the error 404 in the graph below.
The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.
index=_internal source=*splunkd_ui_access.log "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart span=1s count by status
Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.
3. Recommendation:
i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62
- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.
ii) In limits.conf
[search]
min_settings_period = 60
The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out.
Here's the logs:
1. auto-cancel logged in to search_messages.log ;
07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled
2. splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph.
There's a 1min gap between the last success of 200 and the error 404 in the graph below.
The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server.
index=_internal source=*splunkd_ui_access.log "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart span=1s count by status
Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout.
3. Recommendation:
i) In web.conf for the versions prior to 9.0 only.
[settings]
job_default_auto_cancel = 62
- 30 in ver 8.2.9 and increased to 62 in ver 9.0+.
ii) In limits.conf
[search]
min_settings_period = 60