Getting Data In

Why are there many duplicate events in the indexer cluster?

xsstest
Communicator

I have a single site cluster that contains 5 indexers, 4 search heads, a master node, and a deployer. There are also some universal forwarders with load balancing.

All events in the indexer cluster are from Universal forwarders. The data flow direction is as follows.(The most common cluster architecture)

Server/Host (UF installed here)—————TCP—————>indexer cluster
Server/Host(syslog)—————Universal Forwarder—————TCP—————indexer cluster 
Server/Host(UF monitors a file)——————TCP————>Indexer cluster

So the question is coming

  1. Why does it return duplicate events when I search? Is it because I'm using TCP? https://answers.splunk.com/answers/537368/why-is-there-event-duplication-via-tcp-port.html?

  2. I disabled the use_ACK function in the outputs.conf on the UF

  3. What are the common causes of repeated events? Please tell me, I can exclude it one by one. Thank you

Forgive me for my English

0 Karma
1 Solution

vasanthmss
Motivator

Hi xsstest,

Here are some steps to debug,

  1. 5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
  2. Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
  3. run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
  4. run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
  5. In-case you get the indextime twice the events are indexed twice. based on your configurations.
  6. In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

View solution in original post

CMEOGNAD
Engager

Hi,

i have a similar problem... at first my config to forward internal splunk indexes...

[tcpout]
forwardedindex.0.whitelist = _.*
forwardedindex.filter.disable = false
defaultGroup = TEST_IDX-CLUSTER

[tcpout:TEST_IDX-CLUSTER]
forceTimebasedAutoLB = true
autoLBFrequency = 30
server = ID01.SPLUNK-TEST.local:9997,ID02.SPLUNK-TEST.local:9997,ID03.SPLUNK-TEST.local:9997

Then i debug with this search...

index="_internal"
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw
| sort - indextime

Output:
bucket = every bucket is another in one event
count = 2 or sometimes 3
indextime = every entry is equal
splunk_server = 01,02,03 or 01,01 or 02,03 or 03,03 (many different combinations)

Anyone an idea?

Regards - Markus

0 Karma

xsstest
Communicator

What are the common causes of repetitive events? How to carry out one by one investigation?

0 Karma

vasanthmss
Motivator

Hi xsstest,

Here are some steps to debug,

  1. 5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
  2. Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
  3. run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
  4. run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
  5. In-case you get the indextime twice the events are indexed twice. based on your configurations.
  6. In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

xsstest
Communicator

count : 416
count : 500
etc .. Returned a total of 3518 results

Some events are repeated twice, and some events are repeated hundreds of times

Can I use the dedup_time _raw to exclude duplicate events when I search?

0 Karma

xsstest
Communicator

@vasanthmss

The reason was found: because the rsyslog configuration error, the same InputFileFacility was used

0 Karma

vasanthmss
Motivator

yes, you can filter the duplicate using dedup during the search time. Have you check the source file have duplicate or not? If yes then you can use the dedup other wise try to figure out why the events were duplicated. Any luck with splunk_server and bucket analysis?

V
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...