Getting Data In

Splunk shows duplicate events in search results when there are no duplicates in the source file.

wpreston
Motivator

When I run a search in Splunk, the results show some duplicate events. I have checked the source file and the events are not duplicated there, so I'm not sure why Splunk is showing duplicates. It's not duplicating every event in the index, and I'm not sure how many of the events it is duplicating, but I know that I have seen it for some (but not all) events of a certain class. It may be happening to others that I just haven't come across yet.

I can use dedup with some options to avoid displaying these in the search results, but that is more avoiding the problem than solving it. Is there a way to stop splunk from creating these duplicates in the index so that I don't have to use dedup with every search?

1 Solution

yannK
Splunk Employee
Splunk Employee

First Identify which events are duplicate.

  • verify of they are coming from the exact same host / source / sourcetype :

myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1

  • check the _indextime to see when each duplicate event was indexed :

myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw

Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :

  • you are using the crcSalt option
  • check the rotation of your files, if no first lines are modified during the process.
  • symlinks, verify that the multiple symlinks are not pointing to the same file/folder

View solution in original post

sumituv
New Member

I am facing the same issue even if I am searching with specific file name.

I removed crcSalt= from input.conf but no result.,I am facing the same issue.Even if I search with specific file.

I removed crcSalt= from input.conf but no result.

0 Karma

gacerioni
Engager

If you are using "crcSalt=<SOURCE>" with rotated logs, this could also cause duplicates.
This happens because the rotated file may stay in the same directory with a different name.

Finally, if your monitor has some wildcards that can match with the name of the rotated files, you'll face a duplicate event.

0 Karma

yannK
Splunk Employee
Splunk Employee

First Identify which events are duplicate.

  • verify of they are coming from the exact same host / source / sourcetype :

myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1

  • check the _indextime to see when each duplicate event was indexed :

myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw

Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :

  • you are using the crcSalt option
  • check the rotation of your files, if no first lines are modified during the process.
  • symlinks, verify that the multiple symlinks are not pointing to the same file/folder

wpreston
Motivator

This was indeed the problem. It looks like Splunk indexed some of my events twice, once at 8 am and once at 1 pm yesterday, I'll have to dig in to figure out why. I'm still a Splunk newbie so this was very helpful:)

0 Karma

Ayn
Legend

No, you're supposed to use dedup all the time.

...kidding 😉
This obviously is not the behaviour you should be seeing, but we need more information than just that you get duplicates. A normal instance of Splunk indexing 'normal' logs will not produce duplicates. You're seeing duplicates because you're not configuring Splunk correctly, or you're indexing logs that confuse Splunk in one way or another, or both. Please give us more details on what you are indexing and how you have set up Splunk.

Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...