Solved: Why are there many duplicate events in the indexer...

xsstest · ‎08-22-2017

I have a single site cluster that contains 5 indexers, 4 search heads, a master node, and a deployer. There are also some universal forwarders with load balancing.

All events in the indexer cluster are from Universal forwarders. The data flow direction is as follows.（The most common cluster architecture）

Server/Host (UF installed here)—————TCP—————>indexer cluster
Server/Host(syslog)—————Universal Forwarder—————TCP—————indexer cluster 
Server/Host(UF monitors a file)——————TCP————>Indexer cluster

So the question is coming

Why does it return duplicate events when I search? Is it because I'm using TCP? https://answers.splunk.com/answers/537368/why-is-there-event-duplication-via-tcp-port.html?
I disabled the use_ACK function in the outputs.conf on the UF
What are the common causes of repeated events? Please tell me, I can exclude it one by one. Thank you

Forgive me for my English

vasanthmss · ‎08-22-2017

Hi xsstest,

Here are some steps to debug,

5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
In-case you get the indextime twice the events are indexed twice. based on your configurations.
In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

View solution in original post

CMEOGNAD · ‎07-20-2022

Hi,

i have a similar problem... at first my config to forward internal splunk indexes...

[tcpout]
forwardedindex.0.whitelist = _.*
forwardedindex.filter.disable = false
defaultGroup = TEST_IDX-CLUSTER

[tcpout:TEST_IDX-CLUSTER]
forceTimebasedAutoLB = true
autoLBFrequency = 30
server = ID01.SPLUNK-TEST.local:9997,ID02.SPLUNK-TEST.local:9997,ID03.SPLUNK-TEST.local:9997

Then i debug with this search...

index="_internal"
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw
| sort - indextime

Output:
bucket = every bucket is another in one event
count = 2 or sometimes 3
indextime = every entry is equal
splunk_server = 01,02,03 or 01,01 or 02,03 or 03,03 (many different combinations)

Anyone an idea?

Regards - Markus

xsstest · ‎08-23-2017

What are the common causes of repetitive events? How to carry out one by one investigation?

vasanthmss · ‎08-22-2017

Hi xsstest,

Here are some steps to debug,

5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
In-case you get the indextime twice the events are indexed twice. based on your configurations.
In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

xsstest · ‎08-22-2017

count : 416
count : 500
etc .. Returned a total of 3518 results

Some events are repeated twice, and some events are repeated hundreds of times

Can I use the dedup_time _raw to exclude duplicate events when I search?

xsstest · ‎11-14-2017

@vasanthmss

The reason was found: because the rsyslog configuration error, the same InputFileFacility was used

vasanthmss · ‎08-23-2017

yes, you can filter the duplicate using dedup during the search time. Have you check the source file have duplicate or not? If yes then you can use the dedup other wise try to figure out why the events were duplicated. Any luck with splunk_server and bucket analysis?

V

Why are there many duplicate events in the indexer cluster?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk