We are experiencing consistent log duplication and data loss when the Splunk Universal Forwarder (UF) running as a Helm deployment inside our EKS cluster is restarted or redeployed.
Environment Details:
Platform: AWS EKS (Kubernetes)
UF Deployment: Helm chart
Splunk UF Version: 9.1.2
Indexers: Splunk Enterprise 9.1.1 (self-managed)
Source Logs: Kubernetes container logs (/var/log/containers, etc.)
Symptoms:
After UF pod restarts/re-deployed:
Previously ingested logs are duplicated.
Logs that were generated during the restart window are missing(not all logs) in Splunk.
The fishbucket is recreated at each restart:
Confirmed by logging into the UF pod post-restart and checking:
/opt/splunkforwarder/var/lib/splunk/fishbucket/
Our Hypothesis:
We suspect this behavior is caused by the Splunk UF losing its ingestion state (fishbucket) on pod restart, due to the lack of a PersistentVolumeClaim (PVC) mounted to:
This would explain both:
Re-ingestion of previously-read files (-> duplicates)
Fail to re-ingest certain logs that may no longer be available or tracked (-> causing data loss)
However, we are not yet certain if the missing logs are due to non-persistent fishbucket and container log rotation
What We Need from Splunk Support:
How can we conclusively verify whether the missing logs are caused by fishbucket loss, file rotation, inode mismatch, or other ingestion tracking issues?
What is the recommended and supported approach for maintaining ingestion state in a Kubernetes/Helm-based Splunk UF deployment?
Is mounting a PersistentVolumeClaim (PVC) to /opt/splunkforwarder/var/lib/splunk sufficient and reliable for preserving fishbucket across pod restarts?
Are there additional best practices to prevent both log loss and duplication, especially in dynamic environments like Kubernetes?
Hi @Ravi1
I agree that the loss of the fishbucket state (due to ephemeral storage) is the cause of both log duplication and data loss after Splunk Universal Forwarder pod restarts in Kubernetes. When the fishbucket is lost, the UF cannot track which files and offsets have already been ingested, leading to re-reading old data (duplicates) and missing logs that rotated or were deleted during downtime.
If logs are rotated (e.g. to myapp.log.1) and Splunk is not configured to monitor the rotated filepath then this could result in your losing data as well as the more obvious duplicate of data due to the file tracking within fishbucket being lost.
As far as I am aware, the approach of using a UF within K8s is not generally encouraged, instead the Splunk validated architecture (SVA) for sending logs to Splunk from K8s is via Splunk OpenTelemetry Collector for Kubernetes - this allows sending of logs (amonst other things) to Splunk Enterprise / Splunk Cloud.
If you do want to use the UF approach (which may/may not be supported) then you could look at adding PVC as is done with the full Splunk Enterprise deployment under splunk-operator, check out the Storage Guidelines and StorageClass docs for splunk-operator.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing