Hi.
We noticed a few of our RHEL8 servers with splunkforwarder installed logs the line (pasted below) up to thousands of times, causing splunkd.log files to grow excessively and fill the /opt directory.
Sometimes it occurs every few seconds, while other times it will log hundreds of times per second. So far there are only a handful of servers experiencing the problem, and we have many others on running the same version and OS.
09-17-2023 20:33:50.029 +0000 ERROR BTreeCP [2386469 TcpOutEloop] - failed: failed to mkdir /opt/splunkforwarder/var/lib/splunk/fishbucket/splunk_private_db/snapshot.tmp: File exists
Doing a restart of the splunkforwarder service mitigates the problem temporarily, but the error occurs again within a few days.
When the error messages come in, the directory already exists and contains files:
# ls /opt/splunkforwarder/var/lib/splunk/fishbucket/splunk_private_db/snapshot.tmp/
btree_index.dat btree_records.dat
We are not sure what causes the issue or how to reproduce it.
This ERROR will happen when there are lot of files being monitored and `parallelIngestionPipelines` set to high value. Multiple threads are trying to update fishbucket at the same time. First thread creates temp file `snapshot.tmp` and if it's still in the process to update fishbucket, other threads will log above ERROR.
@kasperl - This could a Splunk issue, I would recommended creating a Support ticket with Splunk.
I hope this helps!!!
Fixed by 9.1.4/9.2.1
Hello @kasperl can you check the ACL of this file? Is it root owned unlike splunkd process?
It's a race condition between two threads trying to create snapshot. The error is harmless as all it's indicating is that snapshot already exists( as other thread already created).