When running a search which takes longer than a couple of seconds to complete, I suddenly see the following error messages.
The search job “1522944681.58347” was canceled remotely or expired.
These messages are displayed approximately one second after the search starts running. If I check the $SPLUNK_HOME/var/run/splunk/dispatch directory, I can see the search artifact getting created, and being reaped right after the search is cancelled. This does not seem to be affecting saved searches which run on a schedule. This is also a standalone Splunk instance running version 6.5.2. What is the cause of this?
The automatic reaping of search artifacts occurs due to the passage of time. How much time is tracked in a variety of different ways via settings in the limits.conf and savedsearches.conf files. While working with the customer on this, we could see the requisite files in the search artifact were constantly being updated when the artifact was reaped. We asked if the dispatch directory was located on an NFS mount point, which it was. To test for a time skew issue, we created a file in the dispatch directory and checked the time of creation. We then ran the date command on the Splunk instance. The creation date of the test file was three minutes earlier than the date of the host computer which confirmed the time skew causing the searches to fail. The time on the NFS mount and the Splunk instance both need to be synchronized using NTP to prevent this type of issue from occurring.