Re: Splunk Store and Forward HA

ephemeric · ‎03-06-2013

Hello all,

Forgive my hasty question, it's late and my articulation has dwindled along with my brain capacity...

We need a solution that allows us to not lose any events whatsoever on a collector, AKA heavy forwarder.

We can't do persistent queues across all inouts as splunktcp-ssl inputs do not allow this. We want all inputs to go straight into Splunk, no syslog relays etc.

The index and forward function won't work for us as it counts toward licence usage. How would one check to see what events were not acknowledged anyway? I assume something would have to be hacked into place to check which events were received?

I thought of writing something that monitors and then stops splunkd, copies over another outputs.conf (with no forward servers configured) and starts Splunk which indexes locally and similarly repeats when uplink is back.

I have noticed if all forward servers are removed from outputs.conf, either at start or via CLI one at a time then Splunk automatically starts to index locally on the fly.

This is ideal as it happens on the fly and no event loss I presume? This seems to be the closest solution I could find except that adding forward servers one at a time caused our data to be cloned in triplicate. Ouch!

We don't want to do cloning, hell no, we assume one uplink in each scenario.

We have three receivers (indexers) on the remote side but only one uplink.

I'm lost, how can we get Splunk to index locally ONLY when the uplink is unavailable and hence the event is not ack'ed and then merge those buckets/events out of band at a later stage?

It would be perfect if we could put all not ack'ed events into an index somehow on the localhost after a timeout and then when back online to forward those same events, get them ack'ed and clean out the local index.

This I know how, force a roll of the last hot and then scrub the ids and scp the warm buckets upstream into an indexer and merge and restart.

Thank you.

jrodman · ‎03-12-2013

If the heavy forwarder has access to a semi-persistent data source (log files) then it does this out of the box.

If the data source is something else (like udp input) then I encourage you to render these as logfiles eg via rsyslogd.

jrodman · ‎03-12-2013

WinEvent live channels should be made to behave similarly to to Monitor / Tailing, but without walking the code I have doubts that it correctly advances its event checkpoints in the same manner as tailing. For .evt files, Tailling tracks whether they have been read, so they should behave properly.

jrodman · ‎03-12-2013

splunktcp-ssl isn't really an original input, but a forwarding mechanism. Splunk forwarding, as discussed in your forwarding question, has the ack mechanism to ensure the datastream is handed off cleanly.
persistent-queues are not really a way to provide data redundancy, but instead a way to provide queue buffering elasticity.

ephemeric · ‎03-12-2013

We could do that but what about the inputs that do not support persistent queues? Like splunktcp-ssl? Because of our high security client sites, we don't have any access to forwarders on hosts to change anything, like Windows event log buffers etc. We get to install it once and then hope for the best.

ephemeric · ‎03-08-2013

We have those other inputs too. Don't want to mess with persistent queues.

bmacias84 · ‎03-08-2013

Ya I dont see the problem that solution. It will work for everything expect streaming data such as perfmon, SNMP, UDP data, etc. Data resilance requires additional hardware (physical or virtual).

ephemeric · ‎03-08-2013

We need something that can tolerate hours, maybe days of downtime.

bmacias84 · ‎03-08-2013

use Indexer acknowledgement between the FW and Indexer. Then increase client output queue (will use more ram). Once the queue is full the FW will stop forwarding events. The FW will then pickup where it left of from once the queue has been processed.

ephemeric · ‎03-08-2013

Thank you for the suggestion. Cool idea, but we cannot purchase and install more hardware at the client site, it's government...

bmacias84 · ‎03-08-2013

Why not use multiple intermediate HFs with load balancing coupled with Indexer acknowledgement. The source Forwarder, intermediate, and Indexer would all have to acknowledge events before they are removed from the queues. You would then increase your queue sizes for input and output to an accept size before the HF stop processing new events. You could also increase your source forwarders output queue. Items will stay these queues until the indexer are function and acknowledging events. I can elaborate if needed.

ephemeric · ‎03-06-2013

Another idea I had was to forward to the instance itself when the uplink was down, to a separate out-of-band index so to speak and then merge this later.

I couldn't get Splunk to forward to itself?

Splunk Store and Forward HA

Application management with Targeted Application Install for Victoria Experience

Index This | What goes up and never comes down?

Splunkers, Pack Your Bags: Why Cisco Live EMEA is Your Next Big Destination

Join the Conversation

Splunk Store and Forward HA

Application management with Targeted Application Install for Victoria Experience

Index This | What goes up and never comes down?

Splunkers, Pack Your Bags: Why Cisco Live EMEA is Your Next Big Destination