I'm using *NIX app 4.6, and for auditd logs I have a duplication problem of events. I also checked the raw logs and they are unique.
Is it possible to remove this problem at the source (i.e. with a script or cli) without use the dedup filter in the console at the analysis phase?
Is it still happening ? I wouldn't bother cleaning up until you've fixed it at source.
Run this :
index=os | streamstats count by _raw _time source sourcetype host | table count _time host source sourcetype _raw
This will show you what is duplicated.
Example :
If you have 3 identical events like this:
2012-11-25 13:01:00 This is a message
You'll see this in the output :
Count ... .. .. .. _raw
3 ... .. .. .. 2012-11-25 13:01:00 This is a message
2 ... .. .. .. 2012-11-25 13:01:00 This is a message
1 ... .. .. .. 2012-11-25 13:01:00 This is a message
From here its pretty easy to delete the duplicates
But 1st a word from safety pig :
_._ _..._ .-', _.._('))
'-. ' ' /-._.-' ',/
) \ '.
/ _ _ | \
| a a / |
\ .-. ;
'-('' ).-' ,' ;
'-; | .'
\ \ /
| 7 .__ _.-\ \
| | | ''/ /' /
/,_| | /,_/ /
/,_/ ''-'
Backup your index first, then delete from the copy to make sure that this works. There is no 'undo'. Only then run it on your live data.
This will delete your duplicates provided that "_raw _time source sourcetype host" are the fields that should make an event unique:
index=os | streamstats count by _raw _time source sourcetype host | where count > 1 | delete
Error in 'delete' command: Missing or malformed messages.conf stanza for DISPATCHCOMM:PREVSTREAM_ERROR__simpleresultcombiner
I'm also having same problem, anyway know how to solve it
Yes, I get the same error!
I'm a VERY green splunker, but when I try this command I get this error...
Error in 'delete' command: This command cannot be invoked after the non-streaming command 'streamstats'.
Am I missing something? Thanks for your post!
Is the problem that you have only a single copy of an event in the raw log but you have more than one copy in Splunk?
Yes, this is the issue; 1 raw event <-> 10 splunk console events (more or less)
We had an issue with Splunk re-indexing the gzipped versions of the log files. Find a pair of duplicate events and see if the "source" field is the same.
Is the problem that you have only a single copy of an event in the raw log but you have more than one copy in Splunk? It's a bit unclear from your question.