Getting Data In

Would monitoring files with logrotate and delayed compression cause reindexing?

hettervik
Builder

If I'm monitoring files that are being rotated with an added timestamp, and the rotated files are being compressed after a couple of days, could this cause reindexing of log events?

I know that Splunk supports reading compressed files, and that as long as you don't add crcSalt=<SOURCE>, log-rotating with a timestamp would not cause reindexing. However, the doc states that adding data to a compressed file would in fact cause reindexing (link). This confuses me. If Splunk decompresses files to read the checksum (to check if the log file have already been indexed or not), why could adding data to a compressed file cause reindexing? If Splunk doesn't read checksums in that way for compressed files, how can we be sure normal rotated log files with delayed compression can't cause reindexing as well?

Hope someone can explain this to me. 🙂

0 Karma

harsmarvania57
Ultra Champion

Hi @ hettervi,

Splunk supports log rotation with checksum of file, in your scenario when log file will be rotated and appended with timestamp in this case splunk will not reindex whole log file until and unless you use crcSalt = <SOURCE>.

For compressed rotated log file in this case I'll suggest you to use whitelist parameter with Regular Expression in monitor stanza in inputs.conf to monitor only current and rotated file but not compressed file because those rotated file already checked and indexed (if required) by splunk.

0 Karma

hettervik
Builder

Hi. I know that Splunk will not re-index rotated log files because of the checksum, I'm also aware that I can blacklist the compressed files, but then what's the point of keeping them? The whole idea of keeping, say, a week of rotated files, is that if Splunk or the network goes down, I have a whole week to notice and get it back up before loosing data. If I blacklist the compressed folders, I won't have a week of on-disk log files for Splunk to read anymore.

What I'm wondering is exactly how Splunk calculates and checks checksums of compressed folders, and in which scenarios compression of monitored log files could cause re-indexation.

0 Karma

harsmarvania57
Ultra Champion

Generally it should not re-index compressed file, I have tested in my lab environment and compressed file didn't re-index. However looking at other thread https://answers.splunk.com/answers/223263/why-is-a-gz-file-created-by-log-rotation-indexed-a.html, it looks likes due to race condition splunk might re-index file again but in general I'll suggest to blacklist compressed file and whenever require manually uncompress it and index those files.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...