Solved: Data loss HF goes down

hkumar26 · ‎07-14-2017

Our set up-

HF receives syslog (directly from firewalls, IPS, etc) and logs from UF (windows and linux machines) and then forwards to Indexers, clustered and then to 2 search heads.

If the HF goes offline ( for few hours or a day), because of hardware or network issue. What will happen to the syslog data forwarded from firewalls to HF during that time, will it be lost? will the data from UF also be lost?

What is recommended to avoid any data loss during this period? Is there a need to modify the existing deployment?

Thanks in Advance

cpetterborg · ‎07-14-2017

The syslog information is usually UDP, which means, "I'm throwing this data over the fence, and I hope there is someone there to catch it." That data will be lost if not caught by something.

The UF data will get cached (for a while anyway, depends on the amount of data) until it can communicate with the HF again. Data in files that are indexed usually have no problems just waiting to send that data once the communication is working again.

The HF will do the same as the UF, but if it has lots more data, it will try to cache in memory the stuff to forward, but it will run out of space much sooner.

If you want to retain the syslog data, go with a syslog server (like rsyslog or syslog-ng) and then forward that data on with a forwarder. The downtime is minimal when the syslog server restarts compared to a forwarder, and it more likely to keep running all the time (simpler). There are lots of articles and answers about doing this.

The UF and HF are probably fine in most instances, but if you want a High Availability architecture, use a load balancer and have additional HF's to forward the data received by the HF. That way you can have your data and eat it too.

I would also set up some alerts that check to see if there is data coming through, and if not, notify so that someone can fix the problem if it is possible to fix.

There is no silver bullet for making sure you won't loose data, but you can lessen the data lost. We tell our users that we are not an archiving service (though I know there are many out there that Splunk is that service - like for PCI, etc.) and so if there is data loss, too bad, so sad. I know that isn't the best answer, but face it, loss happens.

View solution in original post

adonio · ‎07-14-2017

hello there,
if there are no queues configured, then the default in memory queue is 500kb.
I assume you are indexing more than that through your Heavy Forwarder and therefore i think that both the syslog data and the data from the UF will be lost
couple of recommendations:
a. use syslog
b. dont use Heavy Forwarder as aggregation (funnel) if possible, let the UFs send the data directly to indexers.
c. read here about persisting queues:
https://docs.splunk.com/Splexicon:Persistentqueue
https://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Data/Usepersistentqueues
hope it helps

cpetterborg · ‎07-14-2017

The syslog information is usually UDP, which means, "I'm throwing this data over the fence, and I hope there is someone there to catch it." That data will be lost if not caught by something.

The UF data will get cached (for a while anyway, depends on the amount of data) until it can communicate with the HF again. Data in files that are indexed usually have no problems just waiting to send that data once the communication is working again.

The HF will do the same as the UF, but if it has lots more data, it will try to cache in memory the stuff to forward, but it will run out of space much sooner.

If you want to retain the syslog data, go with a syslog server (like rsyslog or syslog-ng) and then forward that data on with a forwarder. The downtime is minimal when the syslog server restarts compared to a forwarder, and it more likely to keep running all the time (simpler). There are lots of articles and answers about doing this.

The UF and HF are probably fine in most instances, but if you want a High Availability architecture, use a load balancer and have additional HF's to forward the data received by the HF. That way you can have your data and eat it too.

I would also set up some alerts that check to see if there is data coming through, and if not, notify so that someone can fix the problem if it is possible to fix.

There is no silver bullet for making sure you won't loose data, but you can lessen the data lost. We tell our users that we are not an archiving service (though I know there are many out there that Splunk is that service - like for PCI, etc.) and so if there is data loss, too bad, so sad. I know that isn't the best answer, but face it, loss happens.

sbbadri · ‎07-14-2017

yeah data get lost for syslog data during that time. You can recover data from UF by reindexing.for that source is still present in origin host.

To avoid data loss for syslog data from HF below are the steps,

1) Use rsyslog or syslog-ng write to the disk on HF.
2) using inputs.conf forward the data from HF disk to indexers.
3) keep syslog data for a period and archive the data using log rotate or any other mechanism.
4) if you have multiple HF, please try load balancing .

Data loss HF goes down

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability

Are you a member of the Splunk Community?

Data loss HF goes down

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability