Getting Data In

Reingestion of changes

kbaden
Explorer

So I've been unable to understand how Splunk works with log ingestion from Folder Monitor when it comes to a document that has already been ingested but has been changed since.

A basic example is a security log. Splunk identifies it, ingests and indexes it etc.
A new entry is added to that security log.

What happens at that point?

Does it reingest the log, duplicating the old data?
Does it not reingest since it has already done so once?
Is it ridiculously smart and just reingests the new data?

Thanks for the help!

Kane

0 Karma
1 Solution

acharlieh
Influencer

Typically, log files are written to in an appending manner. So by default, Splunk keeps track of aspects about each file it's monitored, (including observed size, modtime, bytes read, and checksums) in a data structure known as "the fishbucket." So in your scenario, Splunk has indexed the file previously, a new entry is added to the end and when Splunk reads the file again, and it only sends the new entry to be indexed.

Entries to the fishbucket are keyed by a checksum of the beginning of the file (so that when log files are rolled, you do not wind up with duplication just because a log file now has a different file name). You can have issues with duplicate indexing if your rolling is also doing compression and you haven't setup Splunk to ignore the compressed log files on your monitor stanza (since checksum of compressed bytes won't usually match checksum of uncompressed bytes).

But there are a lot of settings in inputs.conf and props.conf to control this behavior. In fact, in cases where the file that you're monitoring a file that is not a log file, where you actually want to reindex the whole file if it's changed, you could set Splunk to only check the modtime or checksum of the entire file and resend everything using a props file.

View solution in original post

acharlieh
Influencer

Typically, log files are written to in an appending manner. So by default, Splunk keeps track of aspects about each file it's monitored, (including observed size, modtime, bytes read, and checksums) in a data structure known as "the fishbucket." So in your scenario, Splunk has indexed the file previously, a new entry is added to the end and when Splunk reads the file again, and it only sends the new entry to be indexed.

Entries to the fishbucket are keyed by a checksum of the beginning of the file (so that when log files are rolled, you do not wind up with duplication just because a log file now has a different file name). You can have issues with duplicate indexing if your rolling is also doing compression and you haven't setup Splunk to ignore the compressed log files on your monitor stanza (since checksum of compressed bytes won't usually match checksum of uncompressed bytes).

But there are a lot of settings in inputs.conf and props.conf to control this behavior. In fact, in cases where the file that you're monitoring a file that is not a log file, where you actually want to reindex the whole file if it's changed, you could set Splunk to only check the modtime or checksum of the entire file and resend everything using a props file.

kbaden
Explorer

Amazing.

Appreciate your help mate.

-Kane

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...