Splunk Search

Dedup is extremely Slow

tdavison76
Path Finder

Hello,

I have a Search that is taking 5 min to complete when looking at only the last 24 hrs.  If possible, could someone help me figure out how I can improve this Search?  I am in need of deduping by SessionId and combing  3 fields into a single field.

source="mobilepro-test"
| dedup Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label

It looks like it's the dedup that is causing the slowness, but I have no idea how to improve that.

Thanks for any help on this one,

Tom

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @tdavison76 

I would recommend using stats for this instead, see below:

source="mobilepro-test"
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| stats latest(label) as label by Session.SessionId

You could switch the order of strcat to save on processing multiple strcat:

source="mobilepro-test"
| stats latest(UserInfo.UserId) as UserInfo_UserId, latest(Location.Site) as Location_Site, latest(Session.StartTime) AS Session_StartTime by Session.SessionId
| strcat UserInfo_UserId " " Location_Site " " Session_StartTime label
| table Session.SessionId, label

Note: We are using "latest" here which keeps the most recent event. 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As a side note, completely irrelevant to the original problem - I'm wondering whether there will be any noticeable performance difference between first(something) and latest(something) in case of a default base search returning results in reverse chronological order.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Thats a good point @PickleRick - for some reason I've always used latest, mainly incase there is any reason that events dont get returned with the most recent first (e.g. sorting of some sort, changes to _time, lookups, appends etc) but I suppose stats will stop looking after the first event if using first() but could read all events to check its still the "latest".
I might try this on a big dataset to see if it makes much difference!

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

first is "closer" to dedup since it keeps the first event in the event pipeline for each unique value of the dedup'd field(s)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could try stats

source="mobilepro-test"
| stats first(UserInfo.UserId) as UserInfo.UserId first(Location.Site) as Location.Site first(Session.StartTime) as Session.StartTime by Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label
0 Karma
Get Updates on the Splunk Community!

Index This | When is October more than just the tenth month?

October 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What’s New & Next in Splunk SOAR

 Security teams today are dealing with more alerts, more tools, and more pressure than ever.  Join us for an ...