Solved: Universal Forwarder don't write events to persiste...

gheodan · ‎01-04-2018

I'm using distributed Universal Forwarders in remote location in order to collect events from remote sites. To prevent data loss I configured persistent queue on disk for specific inputs.

input.conf
[udp://514]
connection_host = ip
index = remotelogs
queueSize = 1MB
persistentQueueSize = 10MB
sourcetype = syslog

Everything works perfect except the following case. While the UF is disconnected from Splunk Server the events received by UF are stored in memory. Even when the UF is gracefully stoped by using: $SPLUNK_HOME/bin/splunk stop the events from memory are not saved to persistent queue on disk.

Dose anyone knows if this is a known issue or an bug? I didn't find any references on this issue.

Evaluated versions: 7.0.1 for both Server and UF.

gheodan · ‎01-17-2018

Hi

This is partial true. Please see below the answer I received from support.

These are the 4 main scenarios I would imagine in a simple forwarder-receiver topology:

A. forwarder is crashing, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.
B. forwarder is gracefully shut down, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.
C. forwarder is crashing, but has been able to forward data to the receiver so far: persistent queue data will be preserved on disk, however in-memory data is very likely to be lost.
D. forwarder is gracefully shut down, but has been able to forward data to the receiver so far: both persistent queue and in-memory data will be forwarded (and indexed) before the forwarder is fully shut-down.

Best regards,
Daniel

View solution in original post

gheodan · ‎01-17-2018

Hi

This is partial true. Please see below the answer I received from support.

These are the 4 main scenarios I would imagine in a simple forwarder-receiver topology:

A. forwarder is crashing, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.
B. forwarder is gracefully shut down, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.
C. forwarder is crashing, but has been able to forward data to the receiver so far: persistent queue data will be preserved on disk, however in-memory data is very likely to be lost.
D. forwarder is gracefully shut down, but has been able to forward data to the receiver so far: both persistent queue and in-memory data will be forwarded (and indexed) before the forwarder is fully shut-down.

Best regards,
Daniel

nickhills · ‎01-17-2018

To be fair, it’s exactly true, because that’s what my answer described 🙂

The only scenario in which data survives a restart is if the forwarder is restarted ungracefully (crash, or forced by the os) while it already has data in the pqueue

In what scenario are you relying on pqueues, because there is almost certainly a better way to preserve your event data through restarts

If my comment helps, please give it a thumbs up!

wkupersa · ‎02-17-2020

What better mechanism is there to persist data when Splunk can't reach the indexers? Splunk continues to read log files. It is committing the data to a memory queue, but not a pqueue, because the memory hasn't filled up. So when the endpoint with the UF is shutdown, those events are just lost. It should either stop reading inputs when it can't reach the indexer or commit to disk as it halts.

nickhills · ‎02-17-2020

The obvious one is don’t use network inputs. - use syslog with a UF.

If my comment helps, please give it a thumbs up!

wkupersa · ‎02-17-2020

The use case is laptops. People disconnect from the corporate network. UF keeps reading events but doesn't persist many of them. When user later shuts down their laptop, the events get lost.

nickhills · ‎01-17-2018

A persistent queue will only get written to once you have filled up the in memory queue.

So if your forwarder is keeping up with the rate of events, nothing gets written to disk (as the memory queue is not full)
Once the in memory queue is full, splunk will start writing to disk, until the p-queue is full (and then it drops events)

Now, in the event that you "shutdown" an indexer, it will delay the shutdown until the memory queue and p-queue have been drained - nothing should persist on disk during a reboot.

However - if your forwarder is struggling to offload its events, your memory queue is full, you have data in the p-queue and your forwarder crashes... in that case, you will have lost the contents of the memory queue, but data held in the p-queue will be persisted, and offloaded to your indexer when it restarts.
https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Usepersistentqueues

If you have forwarders (or indexers) which get periodically backlogged, persistent queues can help buffer events so they don't get dropped during busy periods, they are not really for HA/DR/"server room oopsies".

If my comment helps, please give it a thumbs up!

ddrillic · ‎01-05-2018

I had a similar doubt from the indexer side of things - Does an indexer write its queues to disk when we shut it down?

gheodan · ‎01-17-2018

Hi community.

I receive the final answer from support team.

I have discussed the topic with one of our Senior Sustaining Engineering colleagues and we realised that the documentation doesn't seem to be totally accurate here. Whenever it talks about crash, it should also mention "splunk stop". These are the 4 main scenarios I would imagine in a simple forwarder-receiver topology:

List item

A. forwarder is crashing, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.

List item

B. forwarder is gracefully shut down, while it is unable to forward data to the receiver (regardless if it's due to unreachable receiver, network issues or incorrect/missing outputs.conf or alike): in-memory data will not be moved into the persistent queue, even if the persistent queue still has got enough space to accomodate the in-memory queue data.
List item

C. forwarder is crashing, but has been able to forward data to the receiver so far: persistent queue data will be preserved on disk, however in-memory data is very likely to be lost.
List item

D. forwarder is gracefully shut down, but has been able to forward data to the receiver so far: both persistent queue and in-memory data will be forwarded (and indexed) before the forwarder is fully shut-down. *

I will inform the documentation team about this missing detail.

Best regards,
Daniel

Universal Forwarder don't write events to persistent queue with graceful service shutdown

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...