Solved: How does dedup work in splunk ?

nibinabr · ‎03-02-2015

How does dedup work in splunk ? My concern is about the performance.
If my search is over 500K -1M events out of which 2K events are duplicates, is using dedup going to be expensive ? Or should I find a way way to delete those 2K events and avoid using dedup ?

Can someone give me suggestions on this or direct me to a discussion where I can find the answer to this question.

musskopf · ‎03-02-2015

It can be expensive, yes, as it needs to save the every unique entry in a temporary place to keep comparing with every following event. To see how expensive it is, just use the Job inspector, it'll show how long each command takes to run.

Also, remember that deleting the record, doesn't actually delete anything, just mark it so won't show up again... but still very handy in your situation as you won't need to re-run dedup every time.

Cheers

View solution in original post

musskopf · ‎03-02-2015

It can be expensive, yes, as it needs to save the every unique entry in a temporary place to keep comparing with every following event. To see how expensive it is, just use the Job inspector, it'll show how long each command takes to run.

Also, remember that deleting the record, doesn't actually delete anything, just mark it so won't show up again... but still very handy in your situation as you won't need to re-run dedup every time.

Cheers

How does dedup work in splunk ?

Can’t make it to .conf25? Join us online!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Are you a member of the Splunk Community?

How does dedup work in splunk ?

Can’t make it to .conf25? Join us online!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions