Help with error from a custom command: ERROR Searc...

jibanes · ‎09-08-2018

Hello,

Splunk 7.1.3, Linux x86_64.

One of my custom (SCPv1) commands errors when the number of events returned exceeds 20,000-30,000 (the value slightly changes between runs; it poses no problem if count(events)<10,000); this is the associated suspicious snippet from search.log:

09-08-2018 17:40:55.446 ERROR ScriptRunner - stderr from 'xxx':  INFO Running /opt/splunk/etc/apps/Splunk_SA_Scientific_Python_linux_x86_64/bin/linux_x86_64/bin/python xxx
09-08-2018 17:40:56.247 INFO  ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL
09-08-2018 17:40:56.247 INFO  DispatchExecutor - User applied action=CANCEL while status=0
09-08-2018 17:40:56.247 ERROR SearchStatusEnforcer - sid:1536453655.14705 Search auto-canceled
09-08-2018 17:40:56.247 INFO  SearchStatusEnforcer - State changed to FAILED due to: Search auto-canceled
09-08-2018 17:40:56.255 INFO  ReducePhaseExecutor - Ending phase_1
09-08-2018 17:40:56.255 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.255 ERROR SearchOrchestrator - Phase_1 failed due to : DAG Execution Exception: Search has been cancelled
09-08-2018 17:40:56.255 INFO  ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL
09-08-2018 17:40:56.255 INFO  DispatchExecutor - User applied action=CANCEL while status=3
09-08-2018 17:40:56.255 INFO  DispatchManager - DispatchManager::dispatchHasFinished(id='1536453655.14705', username='admin')
09-08-2018 17:40:56.256 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.261 WARN  SearchResultWorkUnit - timed out, sending keepalive nConsecutiveKeepalive=0 currentSetStart=0.000000
09-08-2018 17:40:56.261 WARN  LocalCollector - Local Collector Orchestrator terminating, writing to the collection manager failed.
09-08-2018 17:40:56.263 INFO  UserManager - Unwound user context: NULL -> NULL
09-08-2018 17:40:56.263 WARN  ScriptRunner - Killing script, probably timed out, grace=0sec, script="xxx"
09-08-2018 17:40:56.265 INFO  UserManager - Unwound user context: NULL -> NULL

Note: I've obfuscated the script name from the log above.

My questions:
— What conditions must arise to have a search auto-canceled?
— What's a DAG Execution Exception?
— What's a known workaround?

thank you.

chudak · ‎01-31-2019

Hello

I am newbie and just installed Splunk today.
i am seeing the same issue:

ERROR SearchOrchestrator - Phase_1 failed due to : DAG Execution Exception: Search has been cancelled
01-31-2019 15:44:36.590 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=CANCEL

And understand what you mean, but have no clue what eat steps to do to fix it.

On the jobs endpoint, there's an auto_cancel option you can use to control this timeout. Setting to 0 should disable it. You can check the settings for the job in a file called runtime.csv in the dispatch directory.

If you are facing this issue with any of the saved searches, then configure the saved search for archiving with "dispatch.auto_cancel = 0"
successfully run the job without auto canceled.

Can you elaborate pls?

mbadhusha_splun · ‎10-25-2018

Hi,

If the search job hasn't been touched by any requests in x seconds we'll cancel the job.

One common use case is when users fire off a job in the search bar ... then modify the search and hit enter before the old search finishes. Nobody is going to poll the status of the old job and nobody cares about it .. so we auto-cancel it.

On the jobs endpoint, there's an auto_cancel option you can use to control this timeout. Setting to 0 should disable it. You can check the settings for the job in a file called runtime.csv in the dispatch directory.

If you are facing this issue with any of the saved searches, then configure the saved search for archiving with "dispatch.auto_cancel = 0"
successfully run the job without auto canceled.

Refer the below Splunk doc for the pause and cancel options. (runtime.csv) is the filename.

https://docs.splunk.com/Documentation/Splunk/7.1.3/Search/Dispatchdirectoryandsearchartifacts#File_d...

Check "Phased execution settings" in the below limits.conf to know more details about the execution process.

http://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Limitsconf

Workaround for saved searches:

dispatch.auto_cancel =
* If specified, the job automatically cancels after this many seconds of
inactivity. (0 means never auto-cancel)
* Default is 0.

http://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Savedsearchesconf#dispatch_search_options

Hope this helps!!

Cheers.

sjcoluccio67 · ‎04-16-2020

What do you mean by "If the search job hasn't been touched by any requests in x seconds we'll cancel the job"? What new requests would be touching a search job that has been kicked off?

jibanes · ‎09-08-2018

As you may see in the search.log snippet above, the command, regardless if it succeeds or fails, returns in less than a second, and should consume no more than a few MB of ram.

Help with error from a custom command: ERROR SearchStatusEnforcer - sid:1536453655.14705 Search auto-canceled

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024