Splunk Search

Dedup is extremely Slow

tdavison76
Path Finder

Hello,

I have a Search that is taking 5 min to complete when looking at only the last 24 hrs.  If possible, could someone help me figure out how I can improve this Search?  I am in need of deduping by SessionId and combing  3 fields into a single field.

source="mobilepro-test"
| dedup Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label

It looks like it's the dedup that is causing the slowness, but I have no idea how to improve that.

Thanks for any help on this one,

Tom

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @tdavison76 

I would recommend using stats for this instead, see below:

source="mobilepro-test"
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| stats latest(label) as label by Session.SessionId

You could switch the order of strcat to save on processing multiple strcat:

source="mobilepro-test"
| stats latest(UserInfo.UserId) as UserInfo_UserId, latest(Location.Site) as Location_Site, latest(Session.StartTime) AS Session_StartTime by Session.SessionId
| strcat UserInfo_UserId " " Location_Site " " Session_StartTime label
| table Session.SessionId, label

Note: We are using "latest" here which keeps the most recent event. 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As a side note, completely irrelevant to the original problem - I'm wondering whether there will be any noticeable performance difference between first(something) and latest(something) in case of a default base search returning results in reverse chronological order.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Thats a good point @PickleRick - for some reason I've always used latest, mainly incase there is any reason that events dont get returned with the most recent first (e.g. sorting of some sort, changes to _time, lookups, appends etc) but I suppose stats will stop looking after the first event if using first() but could read all events to check its still the "latest".
I might try this on a big dataset to see if it makes much difference!

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

first is "closer" to dedup since it keeps the first event in the event pipeline for each unique value of the dedup'd field(s)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could try stats

source="mobilepro-test"
| stats first(UserInfo.UserId) as UserInfo.UserId first(Location.Site) as Location.Site first(Session.StartTime) as Session.StartTime by Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label
0 Karma
Get Updates on the Splunk Community!

Celebrating Fast Lane: 2025 Authorized Learning Partner of the Year

At .conf25, Splunk proudly recognized Fast Lane as the 2025 Authorized Learning Partner of the Year. This ...

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...