Re: Is there a performance impact by using Dedup c...

uhkc777 · ‎12-19-2016

Hi,

I'm using dedup command in almost all my search queries. Does it have any impact on performance? If yes, what's the alternative for that?

Thanks,

pjvarjani · ‎12-20-2016

Dedup is absolutely ok with larger dataset also for your requirements. Since you want to do some logic on top of dedup, stats dc() and Head commands are out of picture here. Try to write two different queries that give same results but with different approaches given below and check in job inspect which query is faster.

base query | dedup | your logic
base query | stats latest.... by field(u want to dedup)

Thanks,
Pankaj

somesoni2 · ‎12-19-2016

Using dedup on larger dataset can be expensive. There are cases where you can replace dedup by using a stats latest(... OR subsearch as filters or something else. Whether dedup can be replaces OR not and if yes, then with what will depend upon your query requirements. Could you give some sample search on how the dedup is being used?

uhkc777 · ‎12-19-2016

index=test |dedup od,line|timechart span=1d count(od) as total|stats avg(total)

felipecerda · ‎12-19-2016

did you try this?:

index=test |dedup line|timechart span=1d dc(od) as total|stats avg(total)

uhkc777 · ‎12-19-2016

dedup line won't work in our scenario. I need to filter the events where od and line are same in the events.

somesoni2 · ‎12-19-2016

Can you compare your dedup results (and performance) with following query?

index=test | eval temp=od."#".line| timechart span=1d dc(temp) as total | stats avg(total)

uhkc777 · ‎12-19-2016

I think it's gonna work out. Thank you. I appreciate your support.

Is there any way to contact you through e-mail or phone?

aaraneta_splunk · ‎12-19-2016

uhkc777 - Did the search query provided by somesoni2 help provide a working solution to your question? Please let me know when you can so that it can be converted to an answer. Thanks!

felipecerda · ‎12-19-2016

It's better to use dc(your_field) whenever you can. I once asked what was the difference to a Splunk Instructor and he said that dc was faster than dedup.

uhkc777 · ‎12-19-2016

I don't want the count. I need to write some logic on top of that dedup command

Is there a performance impact by using Dedup command in SPL Queries?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation