Getting Data In

Help with understanding Seekptr and duplicate entries?

Smashley
Explorer

I've got a handful of files that seem to be ingested multiple times, though can't quite figure out why. File is a tomcat log and name is in the format hostname-stderr-dd-mm-yyyy.log, and does not roll. Around once a day, but sometimes every other day or twice a day the file will be re-ingested with splunkd.log entry indicating :

 

 

 

02-23-2022 08:43:33.602 -0500 INFO  WatchedFile [10484 tailreader0] - Checksum for seekptr didn't match, will re-read entire file='C:\Tomcat.....

 

 

 


I've set crcSalt=<SOURCE> and played with initCrcLength to no avail and everything in Answers referencing these splunkd entries that I've found indicates to change the crcSalt or initCrcLength settings, so I'm just trying to ensure I understand what exactly seekptr is referring to here.

Please correct me if I'm mistaken but I think the 'seekptr' is the 'seekAddress' (making the checksum for it the 'seekCRC') referenced in the below doc page, so my assumption is that the seekAddress is found but the CRC has somehow changed, so Splunk assumes the file is different. The problem is, after looking at the file before/after this happens, I see no reason why this CRC would have changed, and any amount of toying with crcSalt or initCrcLength won't make a difference here as it isn't the 'init' bit that's changing.

I've got a dashboard set up showing the same events repeated with the same timestamp but different ingest times correlating with the above splunkd.log entries.

My only theory is that somehow Splunk indexed the file mid-write by the application if this is even possible? Other log files for this same application and location don't seem to do this and I've not been able to find any known bugs specific to Tomcat stderr files (though certainly possible our people are doing something weird with log config).

Relevant inputs.conf stanza:

 

 

 

[monitor://C:\Tomcat-*\logs\*stderr*.log]
index=app_logs
sourcetype=stderr
ignoreOlderThan=1d
crcSalt=<SOURCE>

 

 

 

I've also manually put CHECK_METHOD=endpoint_md5 in props.conf in case somehow the check_method for stderr got changed from the default somewhere along the way, and I've also confirmed that this isn't happening when the file modified timestamp is updated.

Next time I have some free time I plan to grab another copy of the file before/after and figure out a way to grab the seekptr and associated crc and compare them myself based on debug logs.


ref:
https://docs.splunk.com/Documentation/Splunk/8.2.4/Data/Howlogfilerotationishandled

Labels (2)
0 Karma

gjlewis
Explorer

Hi @Smashley, did you ever manage to resolve this issue as I'm experiencing very similar behaviour with an XML KANA log. Splunk is continually re-ingesting the whole file every time a new entry is written to it. The internal log reports "seek crc didn't match" and "Checksum for seekptr didn't match, will re-read entire file"
Thanks

0 Karma
Get Updates on the Splunk Community!

Celebrating Fast Lane: 2025 Authorized Learning Partner of the Year

At .conf25, Splunk proudly recognized Fast Lane as the 2025 Authorized Learning Partner of the Year. This ...

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...