Getting Data In

Forwarder questions on moving files

msarro
Builder

Hey everyone. I am looking to possibly begin using some lightweight forwarders on some of our production servers to get access to data. Here is how the software we are trying to monitor functions:

All items are written to a file for a period of 15 minutes, each line written being a new event we want to track in splunk (written in real time). At the end of that 15 minute period, the file is closed by the 3rd party application, and moved to a new directory. The files are stored in this directory for a total of 2 weeks and then purged. I would like to set up the forwarder to tail the file as it is initially written so I can get events in real time. I would also like to monitor the directory where the files are moved to in case there is ever a need to reindex. Is doing this possible? What happens if the file is closed and moved in the middle of splunk reading it? Will it be able to complete what it was reading from the new location? Will splunk re-index every event even though I have crcsalt set up? It is quite likely splunk may fall behind on indexing on the forwarder given the volume of lines to be written to the file.

Any help would be appreciated; this is data we have to index as close to real-time as possible.

Tags (1)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

Without crcSalt, this should work just fine. The whole idea of CRC'ing the head of a file allows Splunk to track when the file changes name or location, but not content. Splunk keeps a database of previously seen CRC values and the last read seekptr into the file. So, if you monitor both locations, Splunk should be able to notice if you move a file to a new directory -- and pick up where it left off.

Beware, though, if the file moves across filesystems (*nix) or drives (Windows) the move is not so much a move as a copy/delete -- and is therefore nonatomic. For best results, your moves need to be atomic -- otherwise, odd things may happen.

This said, though, crcSalt = <SOURCE> would probably break this whole approach, as then the CRC value is based upon the path/name of the file - which changes when you move it.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

crcSalt is only applicable to inputs.conf, but is applicable there on a stanza-by-stanza basis.

0 Karma

msarro
Builder

Also, I have crcsalt= added in inputs.conf, can it be added elsewhere so as to avoid the global application of crcsalt?

0 Karma

dwaddle
SplunkTrust
SplunkTrust

No, crcSalt= is completely valid, but it (essentially) breaks file rotation support. Quoting from the docs, "Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled." As I understand your use case, this is very troublesome for it.

0 Karma

msarro
Builder

I'm currently using "crcSalt=" (literally, should it be a sourcetype?) because a number of our sources include a large number of similar data both in the beginning and end of the file, so without it we lose the ability to get a number of data points. Should I modify it to crcSalt=Source type name?

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...