Getting Data In

How to retain UF based data when link between on-prem heavy forwarder to Splunk Cloud goes down ?

Builder
Whats the alternative when link between on-prem HF to Splunk cloud goes down? how we can we prevent loss of data during the interim?

For syslog, we already use syslog server so no issue on that part. However, what can we do for data from non-syslog based sources such as UFs?

There is option for Persistent queues but its not available for these input types:
  • Monitor
  • Batch
  • File system change monitor
  • splunktcp (input from Splunk forwarders)
0 Karma

SplunkTrust
SplunkTrust

How long are we talking on the outages? Is this a flaky network connection from your HFs to Splunk Cloud? Is this a few minutes or are we talking hours? How many events would need to be queued to keep this from failing (HFs deal in event count)

If it's not too long, I think you are on the right path with your queues, but it needs to be at the other end of the parsing. Outputs.conf maxQueueSize on the HF can reach a large number of events. This is, of course, resource intensive, but why else do you have HFs being the funnel if they aren't there to be used.

Now, theoretically if your HF queues all fill, then it's parsing queues fill, and it backs onto your forwarders as the HF refuses to accept the data.

All this said, what is happening with your connection to Splunk Cloud that this is a big concern? I'd be checking into fixing that (if possible).

0 Karma

Builder

This is just part of the risk planning for cloud migration. 

There are HFs as an intermediate forwarder to collect data from on-prem forwarders and send it to Splunk Cloud.

So essentially, after the queues are filled HF will stop accepting data, does that mean UF also wont send any data. Assuming once the link is up between HF to Cloud, then UF will start sending data from the last read location and HF will start accepting data. I believe, this wont result in any data loss ?

Update: to answer your first query, we are looking to support atleast 8 - 24 hour downtime period. In that case, what sort of solution we should look at ?

0 Karma