Getting Data In

Why am I seeing Duplicate data (_raw)?

VijaySrrie
Builder

Hi,

I could see duplicate data in splunk by using below query

index="indexname"
| stats count by _raw
| where count >1

 

I checked 10 different indexes out of which 8 indexes are having duplicate logs


 

Labels (1)
Tags (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Apart from the issues already covered by @ITWhisperer and @gcusello there is also the possibility of duplication caused by useACK parameter, communication problems and retrasnsmission. Check your _internal log for "possible duplication".

gcusello
SplunkTrust
SplunkTrust

Hi @VijaySrrie,

you have to analyze your data sources: some of them aren't correctly configurated.

The most common misconfigurations are the following:

  • installed forwarders in an active/active cluster,
  • receiving syslog using two syslog servers but without a Load Balancer,
  • use the crcSal = <SOURCE> with logs that rotate or tar the old files.

the first job should be, using the following search, undertanding of what are the duplicated data source:

index="indexname"
| stats values(sourcetype) AS sourcetype count by _raw
| where count>1

in this way you have the list of duplicated sourcetypes and you can focus your analyis on those sourcetypes.

Then, if the duplicated sourcetypes come from a cluster or syslogs, you should analize your architecture to understand eventual duplications.

Ciao.

Giuseppe

ITWhisperer
SplunkTrust
SplunkTrust

Other duplications can come from how your logs are managed at source, e.g.  are they replicated across multiple servers, or are they renamed and picked up multiple times. Try these to find these sorts of occurrences.

index="indexname"
| stats values(source) AS source count by _raw
| where count>1
index="indexname"
| stats values(host) AS host count by _raw
| where count>1
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...