I have a SPL, when first running the result is appearing but once the query is finished the error have shown below:
| tstats `summariesonly`
count(All_Traffic.dest_ip) as destination_ip_count,
count(All_Traffic.src_ip) as source_ip_count,
count(All_Traffic.dest_port) as destination_port_count,
count(All_Traffic.src_port) as source_port_count
from datamodel=Network_Traffic.All_Traffic
by
All_Traffic.src_ip,
All_Traffic.src_port,
All_Traffic.dest_ip,
All_Traffic.protocol,
All_Traffic.src_zone,
All_Traffic.protocol_version,
All_Traffic.action,
_time
| lookup 3rd_party_network_connections_vendor_ip.csv index_ip as All_Traffic.src_ip OUTPUT value_ip
| where isnotnull(value_ip) AND All_Traffic.src_port !="53" AND (All_Traffic.action="blocked" OR All_Traffic.action="denied" OR All_Traffic.action="failed") AND source_ip_count > 40 AND destination_ip_count > 40
-----------------------
The error
StatsFileWriterLz4 file open failed file=C:\Splunk\var\run\splunk\srtemp\910252184_17768_at_1638875294.1\statstmp_merged_5.sb.lz4
-------------
May you validate if my SPL query is correct or not?
Thanks
We are seeing the same thing too. We are seeing it with a long running job (~20mins) that is pulling a few million records.
Note that the file splunkd is trying to open is in /var/run/ but not in /var/run/dispatch/<sid> so this is some internal secret cache that splunkd is using.
The relevant excerpt from the (frankly enormous) search.log is below.
tl;dr is that it says the file is being used by another process.
A couple seconds later it tries to remove the directory and that fails because the directory is not empty.
And then a few more seconds and suddenly it triggers splunkd shutdown, in the search.log. which.... makes no sense because customer did not report that splunk shut down.
.....
nothing too interesting and then
.....
01-06-2022 12:33:09.439 INFO UserManager [7352 SearchResultExecutorThread] - Unwound user context: admin -> NULL
01-06-2022 12:33:09.439 INFO UserManager [7700 SearchResultExecutorThread] - Unwound user context: admin -> NULL
01-06-2022 12:33:09.457 INFO ResultsCollationProcessor [4780 phase_1] - Writing remote_event_providers.csv to disk
01-06-2022 12:33:09.459 INFO DispatchExecutor [4780 phase_1] - END OPEN: Processor=stats
01-06-2022 12:33:09.459 INFO DispatchExecutor [4780 phase_1] - BEGIN OPEN: Processor=timechart
01-06-2022 12:33:09.673 ERROR StatsFileIO [4780 phase_1] - open failed for file=C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2\statstmp_merged_0.sb.lz4, error: The process cannot access the file because it is being used by another process.
01-06-2022 12:33:09.674 INFO ReducePhaseExecutor [4780 phase_1] - Not downloading remote search.log files. Reason: No remote event providers.
01-06-2022 12:33:09.674 INFO ReducePhaseExecutor [4780 phase_1] - Not downloading remote search_telemetry.json files. Reason: No remote event providers.
01-06-2022 12:33:09.674 INFO ReducePhaseExecutor [4780 phase_1] - Ending phase_1
01-06-2022 12:33:09.674 INFO UserManager [4780 phase_1] - Unwound user context: admin -> NULL
01-06-2022 12:33:09.674 ERROR SearchOrchestrator [8764 searchOrchestrator] - Phase_1 failed due to : StatsFileWriterLz4 file open failed file=C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2\statstmp_merged_0.sb.lz4
01-06-2022 12:33:09.674 INFO ReducePhaseExecutor [5628 StatusEnforcerThread] - ReducePhaseExecutor=1 action=CANCEL
01-06-2022 12:33:09.674 INFO DispatchExecutor [5628 StatusEnforcerThread] - User applied action=CANCEL while status=0
01-06-2022 12:33:09.675 ERROR SearchStatusEnforcer [5628 StatusEnforcerThread] - sid:1641492672.2086 StatsFileWriterLz4 file open failed file=C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2\statstmp_merged_0.sb.lz4
01-06-2022 12:33:09.675 INFO SearchStatusEnforcer [5628 StatusEnforcerThread] - State changed to FAILED due to: StatsFileWriterLz4 file open failed file=C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2\statstmp_merged_0.sb.lz4
01-06-2022 12:33:09.703 INFO UserManager [5628 StatusEnforcerThread] - Unwound user context: admin -> NULL
01-06-2022 12:33:09.710 INFO DispatchStorageManager [8764 searchOrchestrator] - Remote storage disabled for search artifacts.
01-06-2022 12:33:09.711 INFO DispatchManager [8764 searchOrchestrator] - DispatchManager::dispatchHasFinished(id='1641492672.2086', username='admin')
01-06-2022 12:33:09.711 INFO UserManager [8764 searchOrchestrator] - Unwound user context: admin -> NULL
01-06-2022 12:33:09.711 INFO SearchOperator:kv [6548 RunDispatch] - Extractor stats: name=_autoKvStats, probes=80, total_time_ms=1200, max_time_ms=15
01-06-2022 12:33:10.248 INFO ISearchOperator [6548 RunDispatch] - 0x30cebfd120 PREAD_HISTOGRAM: usec_1_8=4644 usec_8_64=0 usec_64_512=0 usec_512_4096=61 usec_4096_32768=0 usec_32768_262144=0 usec_262144_INF=0
01-06-2022 12:33:11.579 WARN StatsProcessorV2 [6548 RunDispatch] - Failed to remove temporary directory: C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2: 1 errors occurred. Description for first 1: [{operation:"failed to remove directory", error:"The directory is not empty.", file:"C:\Program Files\Splunk\var\run\splunk\srtemp\2075148042_2268_at_1641492677.2"}]
01-06-2022 12:33:16.835 INFO SearchStatusEnforcer [6548 RunDispatch] - SearchStatusEnforcer is already terminated
01-06-2022 12:33:16.837 INFO UserManager [6548 RunDispatch] - Unwound user context: admin -> NULL
01-06-2022 12:33:16.837 INFO LookupDataProvider [6548 RunDispatch] - Clearing out lookup shared provider map
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - Shutting down splunkd
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_Begin"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_NoahHealthReport"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_FileIntegrityChecker"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_JustBeforeKVStore"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_KVStore"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_DFM"
01-06-2022 12:33:16.846 INFO ShutdownHandler [7912 Shutdown] - shutting down level "ShutdownLevel_Thruput"
.....
And then it goes back to tens of thousands of messages about various calcfields not being extracted because they're not needed.
.....
I'm seeing the same thing. I am on version 8.2.1. I have read where upgrading to 8.2.4 fixed it. Is this true? Have we figured out the cause?
We are on 8.2.7 and just saw this error so not fixed in 8.2.4
It was a very long running job with a lot, lot of events.
Hi, you mean in 8.2.4 if the search query took so long to finish it will return that kind of error like statsfilewriter?
My point was someone asked if fixed in 8.2.4 and it is not.
I ran the search as admin and it should not have failed. There are no other errors for this head at that time so I believe this is a bug.
Ah I see, thanks.
Hi, you mean in 8.2.4 if the search query took so long to finish it will return that kind of error like statsfilewriter.
Check the search log for messages related to the "file open failed" message. Then correct the file system error (permissions, etc.).