Getting Data In

Continuously monitor a single file while replacing the file with an appended version of the same file.

oliverj
Communicator

I am trying to monitor several individual files for changes.
For example, I will watch "FILE1.log"
If that file is appended from 10 lines to 15 lines, Splunk will pick up 5 more lines (good)
If someone does a save-as to "FILE1.log" with an appended version that is 21 lines long, Splunk will pick up 6 more lines (good)
If someone Drag/Drops a new "FILE1.log" that has the same contents as the original 21 line "FILE1.log" into the FILE1 monitored location, the Splunk indexer now shows 42 results.

Apparently, even though the file contents and name are the same, the process of replacing the file with a new one is triggering a complete re-index.

Here is my use case, in case there is a better way of going about this:

I am tasked with ingesting text-based syslog files from several external computer systems.
These files are provided to me on a CD each week.
Each logfile name (ex PC1.log) is the same each week -- the file is just appended.
So on month 1, PC1.log will have results from Jan 1-31, 2015.
On month 2, PC1.log will have results from Jan 1 - Feb 28, 2015.
And so on.

To ingest these logs, I:
Set up an individual file monitor for each new system.
Drag the contents of the CD into the folder containing the monitored files.
Select "Overwrite" when Windows informs me that there are files in the destination directory with the same name.

1 Solution

dwaddle
SplunkTrust
SplunkTrust

So, the name of the game here is "atomicity". What I mean is that if Splunk can, for a moment, see the file "in between states" during your overlay, then it can basically make a determination of .. "So, I know this file by CRC, but the last time I saw it, it was bigger than it is now. It must have been rotated and this is a new 'instance' of that file that I must read from the beginning."

On most operating systems, a file COPY is no where near atomic. If you were fast enough to see it happen you would see the destination file growing a little bit every fraction of a second. But a file move (at least on the same drive / filesystem) is usually atomic (at least on Unix based OSs).

What I would suggest is copying the files to a temporary directory in the same drive - and then move them into your monitored directory. This almost always works on Unixes - but I'm not sure how well it does on Windows TBH.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

So, the name of the game here is "atomicity". What I mean is that if Splunk can, for a moment, see the file "in between states" during your overlay, then it can basically make a determination of .. "So, I know this file by CRC, but the last time I saw it, it was bigger than it is now. It must have been rotated and this is a new 'instance' of that file that I must read from the beginning."

On most operating systems, a file COPY is no where near atomic. If you were fast enough to see it happen you would see the destination file growing a little bit every fraction of a second. But a file move (at least on the same drive / filesystem) is usually atomic (at least on Unix based OSs).

What I would suggest is copying the files to a temporary directory in the same drive - and then move them into your monitored directory. This almost always works on Unixes - but I'm not sure how well it does on Windows TBH.

oliverj
Communicator

This is being done on windows now --- for my tests, they were just 2 folders next to each other.
Good to know it might work on Unix.

I will check if deleting the original file then moving the new one in will work.

0 Karma

oliverj
Communicator

On windows:
Delete monitored file (20 events).
Move new file of same name/appended contents in to the monitored file's old path (Contains 25 events, 20 old, 5 new).
Splunk successfully ingests the 5 new events.

Thanks for putting me on the right track, dwaddle.

If you can think of a better way of accomplishing my original task, I would love to hear it. But this will do just fine.

Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...