Hello,
I am trying to join two searches for our AoVPN remote login system that highlights a path from user, machine name, RAS server, to the IP assigned once inside our environment. I have one search that contains nearly all results I need, with the exception of a user ID and user name (from AD). The addition of user ID and name would be for convenience.
I have done some field extractions to pull the desired items out for a better dash:
index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip
From here, I join another sourcetype (same index) that contains the user ID and user name. Both sourcetypes have the email addresses, so I am attempting to join with "mail" as the focus. This search seems to work fine, but only returns a minimal number of results.
index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip
| join mail
[| search index=windows sourcetype=ActiveDirectory mail=*
| rename sAMAccountName as User_ID
| table _time, User_ID, name, mail]
Both searches, when ran individually, return 2000+ results. However when I run the joined search, I only see twenty or so results. The results are reliable and seem accurate, just very few of them.
I'm learning myself as I go, so may be missing something simple. Thanks for any help.
Remember that there are memory, result set size and execution time limits for subsearch. So if it exceeds the limits the subsearch gets silently terminated.
That's one of the reasons to avoid subsearches if you can.
Your case can be rewritten using stats.
(index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null)
OR (index=windows sourcetype=ActiveDirectory mail=*)
| rename host as rasl_server
| rename sAMAccountName as User_ID
| fields mail machine Tunnel_Endpoint rasl_server srcip User_ID name
| stats values(*) as * by mail
Remember that there are memory, result set size and execution time limits for subsearch. So if it exceeds the limits the subsearch gets silently terminated.
That's one of the reasons to avoid subsearches if you can.
Your case can be rewritten using stats.
(index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null)
OR (index=windows sourcetype=ActiveDirectory mail=*)
| rename host as rasl_server
| rename sAMAccountName as User_ID
| fields mail machine Tunnel_Endpoint rasl_server srcip User_ID name
| stats values(*) as * by mail
PR - thanks for the help on this. I'm guessing that the result size and execution time limits are adjusted internally with files such as transforms, limits.conf, etc? I have minimal experience "under the hood" but am working to learn this area.
I appreciate the rewritten code using stats, however it doesn't seem to populate all fields (Tunnel_Endpoint, User_ID, and srcip not returning results). I'll look a bit deeper into it to be sure I didn't make an error with your code.
Thanks VERY much!
Yes, the limits are set in... surprise, surprise... limits.conf 🙂
But unless it's really really necessary, it's better not to touch that file really. OK, if you have a huge setup and much processing power to spare you might increase a bit maximum number of parallel runing searches but that's completely another story.
Often it's much better to try and rewrite your searches more efficiently - many searches can be written really effectively or really very inefficiently.
Anyway, if you're not getting the fields, that's strange. Of course I don't know your data, but your initial search included
index=windows sourcetype=rasl mail=* srcip!=WPDOTRASL0* machine!=null
| rename host as rasl_server
| table _time, mail, machine, Tunnel_Endpoint, rasl_server, srcip
I literarily do the very same thing in my example (ok, I don't use "table" but use "fields" instead; there is a difference but not where it matters in this case).
You're correct, the individual searches return plenty of results so not sure what happened. I'll try incorporating your search a piece at a time to verify I get results and work towards the final search you provided.
Subsearch (one written within [ ] brackets) by default returns a maximum of 10000 results only.
limits.conf - https://docs.splunk.com/Documentation/Splunk/8.2.4/Admin/Limitsconf
Great, appreciate the help. I'll read up on the affects of any changes to this parameter. I'm attempting to generate and use a .csv file for my AD information, which I could update occasionally, rather than the secondary search I posted.
Thanks again and I'll let you know what I end up using.