How to troubleshoot why Splunk is reindexing log f...

fatemabwudel · ‎07-19-2016

Hi,

I have Splunk Universal Forwarder running on my BRO-IDS sensor machine and monitoring a log directory where Bro rotates the files every hour and the rotated files are kept elsewhere in a other dir in gz format.

The rotation happens at the end of every hour and the log file name never changes (i.e, conn.log, http.log etc will remain conn.log, http.log in the directory that Splunk is monitoring, but will have new data for that hour). I was happily indexing some of my Bro log files thinking that Splunk will know whenever the log files will get rotated and starts reading the new file (with the same name) from the beginning when it will calculate checksum for that file.

When I searched the events in my search head corresponding to the Bro logs, every entry had 12 duplicated indexed events with weird time stamps (look at the unix ts in event) in the events like :

14689504660 C8129F2JOKFPessVv4 119.255.39.66 1117 xxx.xx.xxx.xxx 14330 tcp - 3.015041 0 0 S0 F T 0 S 2 104 0 0 (empty)
14689504660 C8129F2JOKFPessVv4 119.255.39.66 1117 xxx.xx.xxx.xxx 14330 tcp - 3.015041 0 0 S0 F T 0 S 2 104 0 0 (empty)
14689504660 C8129F2JOKFPessVv4 119.255.39.66 1117 xxx.xx.xxx.xxx 14330 tcp - 3.015041 0 0 S0 F T 0 S 2 104 0 0 (empty)
14689504660 C8129F2JOKFPessVv4 119.255.39.66 1117 xxx.xx.xxx.xxx 14330 tcp - 3.015041 0 0 S0 F T 0 S 2 104 0 0 (empty)

.......
.......
..... So on up to 12 events

This exceeded our per day license usage (ofc ourse if Splunk is re-indexing same file, that is ~12GB, 12 times, it is going to exceed the license). Hence I turned off the monitoring of my Bro logs to trouble shoot the issue, and couldn't figure out why Splunk is doing this.

Also looked into the Splunkd logs on forwarder which looked kind of normal (log files got read every hour):

$ less splunkd.log | grep '/mnt/ssd/bro/spool/manager/conn.log' | grep '07-19-2016' | more
07-19-2016 00:16:42.532 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 01:14:21.794 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 01:14:21.794 -0400 WARN  TailReader - Enqueing a very large file=/mnt/ssd/bro/spool/manager/conn.log in the batch reader, with bytes_to_read=359045778, reading of other large files could be delayed
07-19-2016 02:39:12.090 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 02:39:12.115 -0400 WARN  TailReader - Enqueing a very large file=/mnt/ssd/bro/spool/manager/conn.log in the batch reader, with bytes_to_read=479827021, reading of other large files could be delayed
07-19-2016 03:00:15.088 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 04:00:15.306 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 05:00:24.703 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 06:00:15.284 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 06:56:11.571 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 07:02:49.600 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 08:54:35.995 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 08:54:36.191 -0400 INFO  TailReader - Will retry path="/mnt/ssd/bro/spool/manager/conn.log" after deferring for 10000ms, initCRC changed after being queued (before=0xcb0d087403c5153a, after=0x33c7d5dc84662354).  File growth rate must be higher than indexing or forwarding rate.
07-19-2016 08:54:46.193 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 08:54:46.213 -0400 WARN  TailReader - Enqueing a very large file=/mnt/ssd/bro/spool/manager/conn.log in the batch reader, with bytes_to_read=1013010522, reading of other large files could be de
layed
07-19-2016 09:10:36.385 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 09:10:36.406 -0400 WARN  TailReader - Enqueing a very large file=/mnt/ssd/bro/spool/manager/conn.log in the batch reader, with bytes_to_read=215379268, reading of other large files could be del
ayed
07-19-2016 10:00:18.237 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.
07-19-2016 11:00:14.982 -0400 INFO  WatchedFile - Will begin reading at offset=0 for file='/mnt/ssd/bro/spool/manager/conn.log'.

Not sure what is causing the re-indexing (approx 12 times) of the same file in Splunk.
Any help would be greatly appreciated.

Thanks.

tweaktubbie · ‎08-22-2016

Perhaps some kind of hash/checksum is being written to the logfile? From the command line / prompt on your server try a 'tail -f logfile.log' and see if duplicate events come in there, to start troubleshooting. I faced a logfile that got and updated every few seconds with at the end some hash or checksum (not from splunk itself but the application log). If this is the case, there should be something different to the duplicates, at least one should have some extra characters at the end. You could on a non-production environment consider to try followtail=1 in your inputs.conf and see if the behaviour is the same.

fatemabwudel · ‎07-27-2016

I don't know whether this was actually the cause of problem but I tried and working so far (also I still see some dups in data indexed, when I restart any of the indexers in the cluster):

I had crc check in place before, thought that the name of the file never changes so it doesn't hurt to have a crc check enabled on Source.
But disabled it, as I realized that, the file name changes (appending the hour's timestamp) just before the file gets rotated and moved to another directory.
And at that time if Splunk does a re-check on the file checksum then the same file would have a different checksum because of changed file name and might cause the re-indexing of the same file from the beginning.

After doing that, I am not seeing any duplicates so far for those monitored files, but keeping an eye on it because those files grow pretty quickly in size and if the re-indexing happened again, then we might hit our daily licensing quota again.

-Fatema.

How to troubleshoot why Splunk is reindexing log file data with some fields messed up?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to troubleshoot why Splunk is reindexing log file data with some fields messed up?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits