- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![rvsroe rvsroe](https://community.splunk.com/legacyfs/online/avatars/562147.jpg)
In the fundamentals 1 course lab 8 tells us to:
"As a best practice and for best performance, place dedup as early in the search as possible." (page 4)
But the quick refence guide tells us that:
"Postpone commands that process over the entire result set (non-streaming commands) as late as possible in your search. Some of these commands are: dedup, sort, and stats" (page2)
the example command they give in lab 8 places dedup in front of the distributable streaming command 'rename':
index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do"
| dedup JSESSIONID
| table JSESSIONID, action, status
| rename JSESSIONID as UserSessions
Would it not make sense to place dedup after rename? I guess 'as early as possible' is ambiguous anyways, but any input on where to place dedup would be greatly appreciated,
Cheers,
Roelof
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![koshyk koshyk](https://community.splunk.com/legacyfs/online/avatars/171489.jpg)
The best way to tackle the above query is
index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do"
| stats count by JSESSIONID, action, status
| rename JSESSIONID as UserSessions
stats
or dedup
is much efficient and reduce the data as much as possible before you do field level manipulations
you do a statistical reduction as early as possible in your search
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![koshyk koshyk](https://community.splunk.com/legacyfs/online/avatars/171489.jpg)
The best way to tackle the above query is
index=main sourcetype="access_combined_wcookie" action=purchase status=200 file="success.do"
| stats count by JSESSIONID, action, status
| rename JSESSIONID as UserSessions
stats
or dedup
is much efficient and reduce the data as much as possible before you do field level manipulations
you do a statistical reduction as early as possible in your search
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![rvsroe rvsroe](https://community.splunk.com/legacyfs/online/avatars/562147.jpg)
Hi Koshyk,
Thank you for the quick reply, just a follow up: this means that if I rename before stats or dedup it would take more time? And this would be the case since it is renaming over a larger dataset than if it was excuted after stats/dedup?
![](/skins/images/396DDBEEAC295EB5FEC41FF128E8AC0A/responsive_peak/images/icon_anonymous_message.png)