Getting Data In

Why does Splunk (re-)index this rolled file? How to troubleshoot?

twinspop
Influencer

Inputs stanza from btool:

[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]
_rcvbuf = 1572864
disabled = 0
host = apphost1
index = reporting_main
sourcetype = reporting_crtlog

The log rotation they use keeps 10 rolled copies, named with .1-10 on the end. Eg, when the original rolls it gets named CRTLog.log.1 and a new CRTLog.log file is created. Standard stuff.

I have confirmed, without a doubt, the rolled files maintain consistent content. I wrote a script to grab checksums of the first 1KB of each file every few seconds. They always check out -- .1's checksum matches what the original showed before rolling.

However, Splunk is sometimes (not all the time) treating the 1st rolled file as a new file:

 WatchedFile - Will begin reading at offset=0 for file='/apps/Logs/apphost1/www/Reporting/CRTLog.log.1'

Probably 30% of the time it re-reads the rolled file. Only .1, never any of the others.

Any tips to further troubleshoot this?

(Ticket's open, but after 3 days I kinda need an answer.)

EDIT: Sample checksum comparo:

I use for f in $(ls); do echo -n "$f: "; head -50 $f | md5sum; done to grab a list:

CRTLog.log: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.1: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.10: ffc1a6dec71a64f69a2f4c42b53d68cb  -
CRTLog.log.2: a3b7d786d8aa7260cc5e46635e764c8f  -
<snip>

Then wait for a roll to fire and grab the new list:

CRTLog.log: ad978fdb89b04169e95ba96c15887042  -
CRTLog.log.1: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.10: 82d1b645c89e4e34b4e0a89712d30f3e  -
CRTLog.log.2: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.3: a3b7d786d8aa7260cc5e46635e764c8f  -
<snip>

So the first 50 lines (about 16 KB worth of data), matches before and after roll to .1. Splunk re-read the file in this case.

0 Karma
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

This issue is resolved by
7.1 (SPL-149198)
7.0.4 (SPL-153453)
6.6.7(SPL-146190)

View solution in original post

Get Updates on the Splunk Community!

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...