Getting Data In

Data cloning - is it a exact clone

MoniGreeth
Engager

Hi All,

In splunk documentation it is mentioned as follows,

"Data cloning
To perform data cloning, specify multiple target groups, each in its own stanza. In data cloning, the forwarder sends copies of all its events to the receivers in two or more target groups. Data cloning usually results in similar, but not necessarily exact, copies of data on the receiving indexers."

AS you see in the bold text, it is mentioned that data is not exact, which is bugging me, will UF forward exact copy of events to the configured indexers or their will be some difference?

for example: say we have two indexer IDX1 and IDX2, configured for cloning in UF, and if we get 1000 events, will both IDX1 and IDX2 will have 1000 events each or their might be some diff?
If anyone can share some information, to get some clarity on this.

Thanks
Moni

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

With Cloning, the same stream of events will be sent to 2 indexers. So they "should" be identical.

However because the events are parsed on the indexer, they may have different rules and props.conf, or timezone and decide to parse the events differently (by example do a different timestamp detection, have extra parsing rules, line breaking)

If you want identical formatted data, use the cluster replication feature, the copy will be at the "bucket" level after indexing, therefore they will be identical.

Remark on cloning: if you have outputs.conf settings like blockOnCloning=false and dropClonedEventsOnQueueFull then if one of the indexer is not reachable, the forwarder can decide to not wait for it, and sent to the remaining one (for the high availability use case when you do want the data event when a node is dead)
http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Outputsconf

View solution in original post

the_wolverine
Champion

Sample configuration?

outputs.conf:
[tcpout:tcpout_group1]
server=10.1.1.197:9997,10.1.1.198:9997
autoLB=true

[tcpout:tcpout_group2]
server=myhost1.splunk.com:9997,myhost2.splunk.com:9997
autoLB=true

inputs.conf:
[monitor:///varl/log/messages]
index=test
sourcetype=mytest
_TCP_ROUTING = tcpout_group1,tcpout_group2

0 Karma

yannK
Splunk Employee
Splunk Employee

With Cloning, the same stream of events will be sent to 2 indexers. So they "should" be identical.

However because the events are parsed on the indexer, they may have different rules and props.conf, or timezone and decide to parse the events differently (by example do a different timestamp detection, have extra parsing rules, line breaking)

If you want identical formatted data, use the cluster replication feature, the copy will be at the "bucket" level after indexing, therefore they will be identical.

Remark on cloning: if you have outputs.conf settings like blockOnCloning=false and dropClonedEventsOnQueueFull then if one of the indexer is not reachable, the forwarder can decide to not wait for it, and sent to the remaining one (for the high availability use case when you do want the data event when a node is dead)
http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Outputsconf

sanjeevdixit
Explorer

Hi,
Did you get any answer to this question?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...