Getting Data In

Why are there many duplicate events in the indexer cluster?

xsstest
Communicator

I have a single site cluster that contains 5 indexers, 4 search heads, a master node, and a deployer. There are also some universal forwarders with load balancing.

All events in the indexer cluster are from Universal forwarders. The data flow direction is as follows.(The most common cluster architecture)

Server/Host (UF installed here)—————TCP—————>indexer cluster
Server/Host(syslog)—————Universal Forwarder—————TCP—————indexer cluster 
Server/Host(UF monitors a file)——————TCP————>Indexer cluster

So the question is coming

  1. Why does it return duplicate events when I search? Is it because I'm using TCP? https://answers.splunk.com/answers/537368/why-is-there-event-duplication-via-tcp-port.html?

  2. I disabled the use_ACK function in the outputs.conf on the UF

  3. What are the common causes of repeated events? Please tell me, I can exclude it one by one. Thank you

Forgive me for my English

0 Karma
1 Solution

vasanthmss
Motivator

Hi xsstest,

Here are some steps to debug,

  1. 5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
  2. Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
  3. run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
  4. run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
  5. In-case you get the indextime twice the events are indexed twice. based on your configurations.
  6. In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

View solution in original post

CMEOGNAD
Engager

Hi,

i have a similar problem... at first my config to forward internal splunk indexes...

[tcpout]
forwardedindex.0.whitelist = _.*
forwardedindex.filter.disable = false
defaultGroup = TEST_IDX-CLUSTER

[tcpout:TEST_IDX-CLUSTER]
forceTimebasedAutoLB = true
autoLBFrequency = 30
server = ID01.SPLUNK-TEST.local:9997,ID02.SPLUNK-TEST.local:9997,ID03.SPLUNK-TEST.local:9997

Then i debug with this search...

index="_internal"
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw
| sort - indextime

Output:
bucket = every bucket is another in one event
count = 2 or sometimes 3
indextime = every entry is equal
splunk_server = 01,02,03 or 01,01 or 02,03 or 03,03 (many different combinations)

Anyone an idea?

Regards - Markus

0 Karma

xsstest
Communicator

What are the common causes of repetitive events? How to carry out one by one investigation?

0 Karma

vasanthmss
Motivator

Hi xsstest,

Here are some steps to debug,

  1. 5 indexer are in same network / data center? Check the network connectivity between each and the indexer master.
  2. Initial step would be, check the file that shows the duplicate results manually and check. if your file have duplicate data you have to update your porps.conf / handle the duplicate in your search.
  3. run some simple search to understand the duplicate pattern, something like index=<your index> sourcetype=<sourcetype> source="<source>" host="<host>" | eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
  4. run the above search and check the count and index time and splunk_server . count - number of time indexed, indextime - when the event indexed in splunk, bucket - which bucket the data is stored.
  5. In-case you get the indextime twice the events are indexed twice. based on your configurations.
  6. In-case you get the multiple splunk_server and same bucket your indexer is not able to connect to master so replicated buckets are being enabled for search. so you have to check your network and indexer cluster configurations.

posting more information will helpful to assist further.

V

xsstest
Communicator

count : 416
count : 500
etc .. Returned a total of 3518 results

Some events are repeated twice, and some events are repeated hundreds of times

Can I use the dedup_time _raw to exclude duplicate events when I search?

0 Karma

xsstest
Communicator

@vasanthmss

The reason was found: because the rsyslog configuration error, the same InputFileFacility was used

0 Karma

vasanthmss
Motivator

yes, you can filter the duplicate using dedup during the search time. Have you check the source file have duplicate or not? If yes then you can use the dedup other wise try to figure out why the events were duplicated. Any luck with splunk_server and bucket analysis?

V
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...