Getting Data In

How to remove duplicate events in search results without using DEDUP

horizonsecurity
Explorer

I'm using *NIX app 4.6, and for auditd logs I have a duplication problem of events. I also checked the raw logs and they are unique.
Is it possible to remove this problem at the source (i.e. with a script or cli) without use the dedup filter in the console at the analysis phase?

Tags (1)
0 Karma

jonuwz
Influencer

Is it still happening ? I wouldn't bother cleaning up until you've fixed it at source.

Run this :

index=os | streamstats count by _raw _time source sourcetype host | table count _time host source sourcetype _raw

This will show you what is duplicated.

Example :

If you have 3 identical events like this:

2012-11-25 13:01:00 This is a message

You'll see this in the output :

Count  ... .. .. ..   _raw
3      ... .. .. ..   2012-11-25 13:01:00 This is a message
2      ... .. .. ..   2012-11-25 13:01:00 This is a message
1      ... .. .. ..   2012-11-25 13:01:00 This is a message

From here its pretty easy to delete the duplicates

But 1st a word from safety pig :

    _._ _..._ .-',     _.._('))
   '-. '     '  /-._.-'    ',/
      )         \            '.
     / _    _    |             \
    |  a    a    /              |
    \   .-.                     ;  
     '-('' ).-'       ,'       ;
        '-;           |      .'
           \           \    /
           | 7  .__  _.-\   \
           | |  |  ''/  /'  /
          /,_|  |   /,_/   /
             /,_/      ''-'

Backup your index first, then delete from the copy to make sure that this works. There is no 'undo'. Only then run it on your live data.

This will delete your duplicates provided that "_raw _time source sourcetype host" are the fields that should make an event unique:

index=os | streamstats count by _raw _time source sourcetype host | where count > 1 | delete

orion44
Communicator

Error in 'delete' command: Missing or malformed messages.conf stanza for DISPATCHCOMM:PREVSTREAM_ERROR__simpleresultcombiner

0 Karma

fabiocaldas
Contributor

I'm also having same problem, anyway know how to solve it

jagadeeshm
Contributor

Yes, I get the same error!

0 Karma

BStodd
Engager

I'm a VERY green splunker, but when I try this command I get this error...

Error in 'delete' command: This command cannot be invoked after the non-streaming command 'streamstats'.

Am I missing something? Thanks for your post!

horizonsecurity
Explorer

Is the problem that you have only a single copy of an event in the raw log but you have more than one copy in Splunk?

Yes, this is the issue; 1 raw event <-> 10 splunk console events (more or less)

0 Karma

reed_kelly
Contributor

We had an issue with Splunk re-indexing the gzipped versions of the log files. Find a pair of duplicate events and see if the "source" field is the same.

0 Karma

sbrant_splunk
Splunk Employee
Splunk Employee

Is the problem that you have only a single copy of an event in the raw log but you have more than one copy in Splunk? It's a bit unclear from your question.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...