I've got a setup that looks something like the following:
SUF (Remote Server) -> SUF (Intermediate Forwarder) -> Splunk (Indexer)
So a remote server (RHEL6) that we want to collect logs from has the Universal Forward installed. This forwards its logs to another server, acting as an intermediate forwarder, which then forwards the logs across a WAN back to a Splunk indexer.
My question is this: If that WAN connection was to drop for a number of hours/days/weeks, when would we start losing logs from the remote server? My understanding is that logs would probably not be lost as long as the log files were still available on the remote server, because Splunk will just stop sending them until the connection is restored, at which point it just picks up where it last left off. The scenario I could see that would cause logs to be lost is if the logs were to be rotated and compressed during an outage, then Splunk would not be able to start shipping from where it had stopped in the logs (compressed logs are blacklisted in the SUF configuration).
Is there anyone who would be able to confirm that the above assumptions (completely untested by myself and unverified by documentation) are correct?
The way to avoid having to deal with this is to stop blacklisting the compressed logs. Splunk will no reindex data that it has already indexed based on the CRC settings. You can read more about it at the link below.
I'm sceptical of this answer. The opening line of the document you referenced states:
"Splunk does not identify compressed files produced by logrotate as being the same as the uncompressed originals. This can lead to a duplication of data if those files are then monitored by Splunk."
It sounds like, if compressed files were configured to be index, the SUF isn't smart enough to identify that these files have already been sent and would therefore send them through again.
In any case, I'm really interested in just having someone confirm whether what I have written is correct.
Also, assuming what I have written is correct, I could probably live with turning off compression. Just as long as I understand how Splunk will handle an interruption in comms between the SUF and indexer (given the topology described above) and how long I have to fix it before logs will be lost.
Thanks again for your response.
You are right, if the files are compressed and the first 256 bytes are changed then it will treat it as a new file. If not, you will fall under Number 3.
How long you have to fix the issue depends on how much memory you have told Splunk it can use. I believe the default is 256MB for a UF. Assuming your logs are 20MB a day and you are only monitoring 1 file, you will have 12 days or so.
Another option, assuming that the UF can be converted to a Heavy Forwarder/Indexer and that it does not forward more than 500MB of data per day, is to use the index and forward option. You can set it up as a free license if this is the case and it will not effect your license. This will ensure that you do not loose any data from the intermediate forwarder, ever.
You are correct in your assumptions. As long as the log file remain on the origin server, even if they are rotated or renamed, Splunk will remember the place it was and pick up again when the connection is re-established. If the file is compressed or deleted, then it will not. So, if the break is for a few minutes, you should be fine. If it's a few weeks, you will probably lose files (unless you have a very generous log retention policy, or extremely low log activity).