Splunk Search

Seeing duplicate events in Search Results ?

arunsony
New Member

I have a source as ///application.log in my inputs.conf.On the servers the application.log will be rolled when it fill up with 10Mb by creating the file name as application.12-13-2014.log and new file with application.log will be creating after rolling . At some point because of this roll over we are missing some events in splunk. So in order to not to miss the events we changed the source as application* (used wild card) in inputs.conf and now we see all the logs getting indexed and showing events in search. But the problem is that we are getting duplicate logs with the same time stamp. The duplicate logs appears with the source as one with application.log and the other with application.12-12-2014.log. So can anyone help me on this issue. Thanks in advance !

Tags (1)
0 Karma

woodcock
Esteemed Legend

In the short term, you can add | dedup _raw to your searches but this degrades performance significantly. If you are using crcSalt = <SOURCE>, make sure that you remove this. It should be that simply changing the whitelist the way that you did fixes the original problem without creating the new problem. if you are not using that setting, then file a bug with the developers because they appear to be writing to the file after it is rotated/renamed, which they should not be doing.

0 Karma

ddrillic
Ultra Champion

Can you post please the inputs.conf file?

The documentation supports (obviously) what @woodcock said - How Splunk Enterprise handles log file rotation

It says -

-- Do not use crcSalt = <SOURCE> with rolling log files, or any other scenario in which logfiles get renamed or moved to another monitored location. Doing so prevents Splunk Enterprise from recognizing log files across the roll or rename, which results in the data being reindexed.

I would look at either initCrcLength and make it larger than the default 256 or/and ignoreOlderThan. If you have ignoreOlderThan = 2h for example, files which were not touched in the past two hours won't be read... it's a problematic situation, when the forwarder goes down for any reason...

0 Karma

arunsony
New Member

inputs.conf : It looks as below
[monitor: ///usr/apps444/test_application*]
sourcetype = test_application
index = Application

All looks in the same format.. Just 4 hours back removed the crcSalt from the inputs.conf.

0 Karma

ddrillic
Ultra Champion

If you removed the crcSalt, you should not see duplicates any more...

Without crcSalt people experience sometimes the opposite problem, in which files are not being indexed when the first 256 bytes of the files are identical.

0 Karma

arunsony
New Member

When I search index= Application for last 1 hour I see around 1000 events and when I use the same search by adding dedup _raw I see the count falled to 750 events. Is the difference is the duplicate events .. ? I am not sure ? Is there any way to find the duplicates for an index ? How do delete the duplicate events which are already indexed in search ?

0 Karma

ddrillic
Ultra Champion

If you are in a situation in which you can delete the index and start from scratch, it's the easiest. In these cases, I re-install the forwarder and start fresh.

0 Karma

arunsony
New Member

No I cannot start from scratch. But one thing is the difference in number of events is the duplicate events or not ?

0 Karma

ddrillic
Ultra Champion

Right, it definitely seems that you still have duplicates.

If you use -

[monitor: ///usr/apps444/test_application*]
sourcetype = test_application
index = Application

and you still see duplicates, I would first double-check that no duplicates exist in the files themselves.

0 Karma

arunsony
New Member

I check in the file there are no duplicates. After removing crcSalt the file is not updated with logs. Need to wait to check still duplicates are coming up or not ?

0 Karma

woodcock
Esteemed Legend

use _index_earliest = -5m in your testing search to make sure that you are looking ONLY at recently indexed events. Events from before the fix will stay duplicated/wrong.

0 Karma

arunsony
New Member

The logs will update during the night hours. So once the logs are updated I need to check still duplicates are coming up or not.

0 Karma

arunsony
New Member

The logs are updated but still I see duplicate events in search result ? Anyother suggestions ?

0 Karma

woodcock
Esteemed Legend

Use btool and show us the settings for inputs and props for this source/type.

0 Karma

arunsony
New Member

my sourcetype is test_application.
The splunk is installed in windows.
Can you tell me what do I need to write at the bin directory ?
for both inputs and props ?

0 Karma

woodcock
Esteemed Legend

Do this:

./splunk cmd btool inputs list --debug

Then skip to the section for your input and make note of all the settings. Then do this:

./splunk cmd btool props list --debug

Then skip to the test_application section and make note of all the settings.

Then post a comment to your Question with the details.

0 Karma

arunsony
New Member

For that specific sourcetype nothing is coming up in props ?
Can I directly post the props.conf completely here ?

0 Karma

arunsony
New Member

D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_AGO = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_HENCE = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_EVENTS = 100000
D:\Program Files\Splunk\etc\system\local\props.conf NO_BINARY_CHECK = true
D:\Program Files\Splunk\etc\system\local\props.conf TRUNCATE = 100000
D:\Program Files\Splunk\etc\system\default\props.conf detect_trailing_nulls = auto
D:\Program Files\Splunk\etc\system\default\props.conf maxDist = 100
D:\Program Files\Splunk\etc\apps\learned\local\props.conf [whoami-too_small]
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_AGO = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_HENCE = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_EVENTS = 100000
D:\Program Files\Splunk\etc\system\default\props.conf MAX_TIMESTAMP_LOOKAHEAD = 128
D:\Program Files\Splunk\etc\system\default\props.conf MUST_BREAK_AFTER =
D:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_AFTER =
D:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_BEFORE =
D:\Program Files\Splunk\etc\system\local\props.conf NO_BINARY_CHECK = true
D:\Program Files\Splunk\etc\apps\learned\local\props.conf PREFIX_SOURCETYPE = True
D:\Program Files\Splunk\etc\apps\learned\local\props.conf SHOULD_LINEMERGE = False
D:\Program Files\Splunk\etc\system\default\props.conf TRANSFORMS =
D:\Program Files\Splunk\etc\system\local\props.conf TRUNCATE = 100000
D:\Program Files\Splunk\etc\system\default\props.conf detect_trailing_nulls = auto
D:\Program Files\Splunk\etc\apps\learned\local\props.conf is_valid = True
D:\Program Files\Splunk\etc\apps\learned\local\props.conf maxDist = 9999
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_AGO = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_HENCE = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_EVENTS = 100000
D:\Program Files\Splunk\etc\system\default\props.conf MAX_TIMESTAMP_LOOKAHEAD = 32
D:\Program Files\Splunk\etc\system\default\props.conf MUST_BREAK_AFTER =
D:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_AFTER =
D:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_BEFORE =
D:\Program Files\Splunk\etc\system\local\props.conf NO_BINARY_CHECK = true
D:\Program Files\Splunk\etc\system\local\props.conf TRUNCATE = 100000
D:\Program Files\Splunk\etc\system\default\props.conf LINE_BREAKER = ([\r\n]+---splunk-wmi-end-of-event---\r\n[\r\n]*)
D:\Program Files\Splunk\etc\system\default\props.conf LINE_BREAKER_LOOKBEHIND = 100
D:\Program Files\Splunk\etc\system\default\props.conf MAX_DAYS_AGO = 2000
D:\Program Files\Splunk\etc\system\default\props.conf MAX_DAYS_HENCE = 2
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_AGO = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_HENCE = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_EVENTS = 100000
D:\Program Files\Splunk\etc\system\local\props.conf NO_BINARY_CHECK = true
D:\Program Files\Splunk\etc\system\default\props.conf TRANSFORMS =
D:\Program Files\Splunk\etc\system\local\props.conf TRUNCATE = 100000
TATE_PUNCT = True
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_AGO = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_DIFF_SECS_HENCE = 2147483646
D:\Program Files\Splunk\etc\system\local\props.conf MAX_EVENTS = 100000
D:\Program Files\Splunk\etc\system\local\props.conf NO_BINARY_CHECK = true
D:\Program Files\Splunk\etc\system\local\props.conf TRUNCATE = 100000

0 Karma

woodcock
Esteemed Legend

Even after I reformatted it, this is not the right stuff. There is a glitch around TATE_PUNCT = True and the vast majority of these settings are for sourcetype [whoami-too_small], which is not test_application.

0 Karma

arunsony
New Member

No specific declartion is done in props.conf for this sourcetype..

0 Karma

arunsony
New Member

I still see the duplicate events in the search results. Can anyone suggest the solution for it ? One more thing how can we identify the duplicate events ?
Just typing the index name and checking the events it shows around 10000 events and when I use dedup _raw it shows 7000 events So these are the duplicate events or not ? Is there any way to find the duplicate events ?

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!