Alerting

Why do joined searches return minimal results?

KDallman
Engager

Hello,

I am trying to join two searches for our AoVPN remote login system that highlights a path from user, machine name, RAS server, to the IP assigned once inside our environment. I have one search that contains nearly all results I need, with the exception of a user ID and user name (from AD). The addition of user ID and name would be for convenience.

I have done some field extractions to pull the desired items out for a better dash:

index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip

From here, I join another sourcetype (same index) that contains the user ID and user name. Both sourcetypes have the email addresses, so I am attempting to join with "mail" as the focus. This search seems to work fine, but only returns a minimal number of results.

index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip
| join mail
[| search index=windows sourcetype=ActiveDirectory mail=*
| rename sAMAccountName as User_ID
| table _time, User_ID, name, mail]

Both searches, when ran individually, return 2000+ results. However when I run the joined search, I only see twenty or so results. The results are reliable and seem accurate, just very few of them.

I'm learning myself as I go, so may be missing something simple. Thanks for any help.

Labels (1)
0 Karma
1 Solution

PickleRick
Ultra Champion

Remember that there are memory, result set size and execution time limits for subsearch. So if it exceeds the limits the subsearch gets silently terminated.
That's one of the reasons to avoid subsearches if you can.
Your case can be rewritten using stats.

(index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null)
OR (index=windows sourcetype=ActiveDirectory mail=*)
| rename host as rasl_server
| rename sAMAccountName as User_ID
| fields mail machine Tunnel_Endpoint rasl_server srcip User_ID name
| stats values(*) as * by mail

View solution in original post

0 Karma

PickleRick
Ultra Champion

Remember that there are memory, result set size and execution time limits for subsearch. So if it exceeds the limits the subsearch gets silently terminated.
That's one of the reasons to avoid subsearches if you can.
Your case can be rewritten using stats.

(index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null)
OR (index=windows sourcetype=ActiveDirectory mail=*)
| rename host as rasl_server
| rename sAMAccountName as User_ID
| fields mail machine Tunnel_Endpoint rasl_server srcip User_ID name
| stats values(*) as * by mail
0 Karma

KDallman
Engager

PR - thanks for the help on this. I'm guessing that the result size and execution time limits are adjusted internally with files such as transforms, limits.conf, etc? I have minimal experience "under the hood" but am working to learn this area. 

I appreciate the rewritten code using stats, however it doesn't seem to populate all fields (Tunnel_Endpoint, User_ID, and srcip not returning results). I'll look a bit deeper into it to be sure I didn't make an error with your code.

Thanks VERY much!

0 Karma

PickleRick
Ultra Champion

Yes, the limits are set in... surprise, surprise... limits.conf 🙂

But unless it's really really necessary, it's better not to touch that file really. OK, if you have a huge setup and much processing power to spare you might increase a bit maximum number of parallel runing searches but that's completely another story.

Often it's much better to try and rewrite your searches more efficiently - many searches can be written really effectively or really very inefficiently.

Anyway, if you're not getting the fields, that's strange. Of course I don't know your data, but your initial search included

index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip

 I literarily do the very same thing in my example (ok, I don't use "table" but use "fields" instead; there is a difference but not where it matters in this case).

0 Karma

KDallman
Engager

You're correct, the individual searches return plenty of results so not sure what happened. I'll try incorporating your search a piece at a time to verify I get results and work towards the final search you provided.

0 Karma

VatsalJagani
Champion

Subsearch (one written within [ ] brackets) by default returns a maximum of 10000 results only.

limits.conf - https://docs.splunk.com/Documentation/Splunk/8.2.4/Admin/Limitsconf 

VatsalJagani_0-1644991573514.png

 

0 Karma

KDallman
Engager

Great, appreciate the help. I'll read up on the affects of any changes to this parameter. I'm attempting to generate and use a .csv file for my AD information, which I could update occasionally, rather than the secondary search I posted.

Thanks again and I'll let you know what I end up using.

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...