Hello All,
I am ingesting compressed(.gz) log files into Splunk by putting it in $SPLUNK_HOME/var/spool/splunk folder. (i.e. when I put the file in this location, Splunk's default batch input will automatically ingest it in Splunk).
when I put a file in this location, Splunk will calculate and maintain it's CRC value to identify the same file in the future.
BUT,
when I put a file with the same name but newer content appended at the end of the file, it prints the logs in splunkd.log like:
03-10-2020 21:10:03.588 +0530 INFO WatchedFile - **Will begin reading at offset=63969** for file='/opt/splunk8/splunk/var/spool/splunk/transaction-events-bfe8ae9a4041c5eaeea1663c583cbd54-72000-79200_0.gz'.
03-10-2020 21:10:13.589 +0530 INFO TailReader - Archive file='/opt/splunk8/splunk/var/spool/splunk/transaction-events-bfe8ae9a4041c5eaeea1663c583cbd54-72000-79200_0.gz' has stopped changing, will read it now.
03-10-2020 21:10:13.589 +0530 INFO ArchiveProcessor - Handling file=/opt/splunk8/splunk/var/spool/splunk/transaction-events-bfe8ae9a4041c5eaeea1663c583cbd54-72000-79200_0.gz
03-10-2020 21:10:13.590 +0530 INFO ArchiveProcessor - reading path=/opt/splunk8/splunk/var/spool/splunk/transaction-events-bfe8ae9a4041c5eaeea1663c583cbd54-72000-79200_0.gz (seek=63969 len=77924)
So, According to the logs, Splunk should ingest only newer content of that file.
But, when I search in Splunk, It is ingesting the whole file again instead of ingesting only newer content.
Does anyone have any idea about this?
Open a support case for sure. This does not smell right at all.