Hello,
Splunk 7.1.3, Linux x86_64.
One of my custom (SCPv1) commands errors when the number of events returned exceeds 20,000-30,000 (the value slightly changes between runs; it poses no problem if count(events)<10,000); this is the associated suspicious snippet from search.log:
09-08-2018 17:40:55.446 ERROR ScriptRunner - stderr from 'xxx': INFO Running /opt/splunk/etc/apps/Splunk_SA_Scientific_Python_linux_x86_64/bin/linux_x86_64/bin/python xxx
09-08-2018 17:40:56.247 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL
09-08-2018 17:40:56.247 INFO DispatchExecutor - User applied action=CANCEL while status=0
09-08-2018 17:40:56.247 ERROR SearchStatusEnforcer - sid:1536453655.14705 Search auto-canceled
09-08-2018 17:40:56.247 INFO SearchStatusEnforcer - State changed to FAILED due to: Search auto-canceled
09-08-2018 17:40:56.255 INFO ReducePhaseExecutor - Ending phase_1
09-08-2018 17:40:56.255 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.255 ERROR SearchOrchestrator - Phase_1 failed due to : DAG Execution Exception: Search has been cancelled
09-08-2018 17:40:56.255 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL
09-08-2018 17:40:56.255 INFO DispatchExecutor - User applied action=CANCEL while status=3
09-08-2018 17:40:56.255 INFO DispatchManager - DispatchManager::dispatchHasFinished(id='1536453655.14705', username='admin')
09-08-2018 17:40:56.256 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 WARN SearchResultWorkUnit - timed out, sending keepalive nConsecutiveKeepalive=0 currentSetStart=0.000000
09-08-2018 17:40:56.261 WARN LocalCollector - Local Collector Orchestrator terminating, writing to the collection manager failed.
09-08-2018 17:40:56.263 INFO UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.263 WARN ScriptRunner - Killing script, probably timed out, grace=0sec, script="xxx"
09-08-2018 17:40:56.265 INFO UserManager - Unwound user context: NULL -> NULL
Note: I've obfuscated the script name from the log above.
My questions:
— What conditions must arise to have a search auto-canceled?
— What's a DAG Execution Exception?
— What's a known workaround?
thank you.
Hello
I am newbie and just installed Splunk today.
i am seeing the same issue:
ERROR SearchOrchestrator - Phase_1 failed due to : DAG Execution Exception: Search has been cancelled
01-31-2019 15:44:36.590 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL
And understand what you mean, but have no clue what eat steps to do to fix it.
On the jobs endpoint, there's an auto_cancel option you can use to control this timeout. Setting to 0 should disable it. You can check the settings for the job in a file called runtime.csv in the dispatch directory.
If you are facing this issue with any of the saved searches, then configure the saved search for archiving with "dispatch.auto_cancel = 0"
successfully run the job without auto canceled.
Can you elaborate pls?
Hi,
If the search job hasn't been touched by any requests in x seconds we'll cancel the job.
One common use case is when users fire off a job in the search bar ... then modify the search and hit enter before the old search finishes. Nobody is going to poll the status of the old job and nobody cares about it .. so we auto-cancel it.
On the jobs endpoint, there's an auto_cancel option you can use to control this timeout. Setting to 0 should disable it. You can check the settings for the job in a file called runtime.csv in the dispatch directory.
If you are facing this issue with any of the saved searches, then configure the saved search for archiving with "dispatch.auto_cancel = 0"
successfully run the job without auto canceled.
Refer the below Splunk doc for the pause and cancel options. (runtime.csv) is the filename.
Check "Phased execution settings" in the below limits.conf to know more details about the execution process.
http://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Limitsconf
Workaround for saved searches:
dispatch.auto_cancel =
* If specified, the job automatically cancels after this many seconds of
inactivity. (0 means never auto-cancel)
* Default is 0.
http://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Savedsearchesconf#dispatch_search_options
Hope this helps!!
Cheers.
What do you mean by "If the search job hasn't been touched by any requests in x seconds we'll cancel the job"? What new requests would be touching a search job that has been kicked off?
As you may see in the search.log snippet above, the command, regardless if it succeeds or fails, returns in less than a second, and should consume no more than a few MB of ram.