Hello,
I have a Search that is taking 5 min to complete when looking at only the last 24 hrs. If possible, could someone help me figure out how I can improve this Search? I am in need of deduping by SessionId and combing 3 fields into a single field.
source="mobilepro-test"
| dedup Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label
It looks like it's the dedup that is causing the slowness, but I have no idea how to improve that.
Thanks for any help on this one,
Tom
Hi @tdavison76
I would recommend using stats for this instead, see below:
source="mobilepro-test"
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| stats latest(label) as label by Session.SessionId
You could switch the order of strcat to save on processing multiple strcat:
source="mobilepro-test"
| stats latest(UserInfo.UserId) as UserInfo_UserId, latest(Location.Site) as Location_Site, latest(Session.StartTime) AS Session_StartTime by Session.SessionId
| strcat UserInfo_UserId " " Location_Site " " Session_StartTime label
| table Session.SessionId, label
Note: We are using "latest" here which keeps the most recent event.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
As a side note, completely irrelevant to the original problem - I'm wondering whether there will be any noticeable performance difference between first(something) and latest(something) in case of a default base search returning results in reverse chronological order.
Thats a good point @PickleRick - for some reason I've always used latest, mainly incase there is any reason that events dont get returned with the most recent first (e.g. sorting of some sort, changes to _time, lookups, appends etc) but I suppose stats will stop looking after the first event if using first() but could read all events to check its still the "latest".
I might try this on a big dataset to see if it makes much difference!
first is "closer" to dedup since it keeps the first event in the event pipeline for each unique value of the dedup'd field(s)
You could try stats
source="mobilepro-test"
| stats first(UserInfo.UserId) as UserInfo.UserId first(Location.Site) as Location.Site first(Session.StartTime) as Session.StartTime by Session.SessionId
| strcat UserInfo.UserId " " Location.Site " " Session.StartTime label
| table Session.SessionId, label