Splunk Search

Best practice dedup: should I use it as early as possible, or postpone it since it is non-streaming?

rvsroe
Explorer

In the fundamentals 1 course lab 8 tells us to:
"As a best practice and for best performance, place dedup as early in the search as possible." (page 4)

But the quick refence guide tells us that:
"Postpone commands that process over the entire result set (non-streaming commands) as late as possible in your search. Some of these commands are: dedup, sort, and stats" (page2)

the example command they give in lab 8 places dedup in front of the distributable streaming command 'rename':
index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do"
| dedup JSESSIONID
| table JSESSIONID, action, status
| rename JSESSIONID as UserSessions

Would it not make sense to place dedup after rename? I guess 'as early as possible' is ambiguous anyways, but any input on where to place dedup would be greatly appreciated,

Cheers,
Roelof

Tags (1)
0 Karma
1 Solution

koshyk
Super Champion

The best way to tackle the above query is

index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do" 
| stats count by JSESSIONID, action, status
| rename JSESSIONID as UserSessions

stats or dedup is much efficient and reduce the data as much as possible before you do field level manipulations
you do a statistical reduction as early as possible in your search

View solution in original post

koshyk
Super Champion

The best way to tackle the above query is

index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do" 
| stats count by JSESSIONID, action, status
| rename JSESSIONID as UserSessions

stats or dedup is much efficient and reduce the data as much as possible before you do field level manipulations
you do a statistical reduction as early as possible in your search

rvsroe
Explorer

Hi Koshyk,
Thank you for the quick reply, just a follow up: this means that if I rename before stats or dedup it would take more time? And this would be the case since it is renaming over a larger dataset than if it was excuted after stats/dedup?

0 Karma
Get Updates on the Splunk Community!

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...

Combine Multiline Logs into a Single Event with SOCK: a Step-by-Step Guide for ...

Combine multiline logs into a single event with SOCK - a step-by-step guide for newbies Olga Malita The ...