Splunk Search

Dedup is extremely Slow

tdavison76
Path Finder

Hello,

I have a Search that is taking 5 min to complete when looking at only the last 24 hrs.  If possible, could someone help me figure out how I can improve this Search?  I am in need of deduping by SessionId and combing  3 fields into a single field.

source="mobilepro-test"
| dedup Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label

It looks like it's the dedup that is causing the slowness, but I have no idea how to improve that.

Thanks for any help on this one,

Tom

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @tdavison76 

I would recommend using stats for this instead, see below:

source="mobilepro-test"
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| stats latest(label) as label by Session.SessionId

You could switch the order of strcat to save on processing multiple strcat:

source="mobilepro-test"
| stats latest(UserInfo.UserId) as UserInfo_UserId, latest(Location.Site) as Location_Site, latest(Session.StartTime) AS Session_StartTime by Session.SessionId
| strcat UserInfo_UserId " " Location_Site " " Session_StartTime label
| table Session.SessionId, label

Note: We are using "latest" here which keeps the most recent event. 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As a side note, completely irrelevant to the original problem - I'm wondering whether there will be any noticeable performance difference between first(something) and latest(something) in case of a default base search returning results in reverse chronological order.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Thats a good point @PickleRick - for some reason I've always used latest, mainly incase there is any reason that events dont get returned with the most recent first (e.g. sorting of some sort, changes to _time, lookups, appends etc) but I suppose stats will stop looking after the first event if using first() but could read all events to check its still the "latest".
I might try this on a big dataset to see if it makes much difference!

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

first is "closer" to dedup since it keeps the first event in the event pipeline for each unique value of the dedup'd field(s)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could try stats

source="mobilepro-test"
| stats first(UserInfo.UserId) as UserInfo.UserId first(Location.Site) as Location.Site first(Session.StartTime) as Session.StartTime by Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...