Getting Data In

How does indexer acknowledgement work with indexer clustering replication to guarantee that no data is lost?

Glenn
Builder

I need to understand in detail how indexer acknowledgement works when it comes to cluster replication, specifically when the chain of acknowledgement is terminated and the forwarder is able to release it from memory. The point is to get a guarantee that no data (event) will be lost. At which point does it consider the data to be indexed (and send an acknowledgement back to the forwarder)?:

A) When the first indexer persists it to disk?
OR
B) When the Cluster Master has finished replicating the data throughout the cluster?

The scenario is this, with indexer acknowledgement (useAck=true) set in all outputs.conf down the chain:
Via forwarding/outputs process: Universal Forwarder -> Heavy Forwarder -> Indexer
Via replication process: Indexer -> replication peer indexers

If the event has been persisted by the first Indexer (and thus an acknowledgement has gone back to the forwarder which then forgets the event), but this Indexer hard crashes (eg. unrecoverable disk corruption) before it is replicated to a peer, do you now have a a missing event?

If after the indexer acknowledges, the data integrity is then dependent on Splunk clustering (not indexer replication) ensuring that the above crash situation does not lead to data loss, then is cluster replication of every single written event guaranteed?

I've read the following and it does not cover this case:
http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Useforwarderstogetyourdata#How_indexer_ack...
http://docs.splunk.com/Documentation/Splunk/6.2.2/Forwarding/Protectagainstlossofin-flightdata#When_...
http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Aboutclusters

1 Solution

Steve_G_
Splunk Employee
Splunk Employee

I believe this section of the docs does answer your question:

http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Useforwarderstogetyourdata#How_indexer_ack...

It states:


If all goes well, the receiving peer:

  1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.

  2. streams copies of the raw data to each of its target peers.

  3. sends an acknowledgment back to the forwarder.

The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.


In other words, the ack does not get sent back to the forwarder until the source peer (i.e., the one that receives the data from the forwarder) has replicated the data to its target peers. So, if the source indexer crashes before it replicates the data to the other peers, the forwarder will not get an acknowledgement.

View solution in original post

Steve_G_
Splunk Employee
Splunk Employee

I believe this section of the docs does answer your question:

http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Useforwarderstogetyourdata#How_indexer_ack...

It states:


If all goes well, the receiving peer:

  1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.

  2. streams copies of the raw data to each of its target peers.

  3. sends an acknowledgment back to the forwarder.

The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.


In other words, the ack does not get sent back to the forwarder until the source peer (i.e., the one that receives the data from the forwarder) has replicated the data to its target peers. So, if the source indexer crashes before it replicates the data to the other peers, the forwarder will not get an acknowledgement.

Glenn
Builder

You are quite right, that part of the doc does appear to be fairly clear about my case. Thanks for your answer.

0 Karma

bontesl
Explorer

Does this still hold true for version 6.6.4 and later?

0 Karma

Dan
Splunk Employee
Splunk Employee

What is sufficient to pass through #2? Will the Cluster Master wait for the replication factor to be met before sending an ACK to the forwarder? What about the search factor?

Consider a multi-site replication environment and the scenario where one site is down. The replication factor won't be met during the outage, and thus the forwarder will not receive an ACK and will stop forwarding data. So in a DR scenario, you won't get real-time data. Is that correct?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...