I have a pretty inefficient search that I would like to improve - it has a lot of flaws. For example, I run the three searches below in order, one after the other. What I should be doing is pulling out a time value in the first search, and running them all concurrently, but the below is the closest I have been able to get to work.
The intent behind this search is to find people running bittorrent at the proxy, compare the natted IP with the DHCP logs to find the MAC address, then check with the authentication servers logs to see who had that IP during that time range.
Step 1)
index=proxy application=Bittorrent | top limit=20 srcip | table srcip | outputlookup bittorrent_list.csv
Step 2)
index=dhcp [|inputlookup bittorrent_list.csv | rename srcip as src_ip] | top limit=20 src_mac | table src_mac | outputlookup bittorrent_mac_from_dhcp_list.csv
Step 3)
index=authentication [|inputlookup bittorrent_mac_from_dhcp_list.csv ]| top limit=20 User_Name
First, check out this info on creating incrementing time-based lookups: https://conf.splunk.com/session/2015/conf2015-LookupTalk.pdf
Apply that logic to your DHCP data to have a reliable way of mapping IP to MAC addresses, which is a super-handy lookup to have around for general use. Have the scheduled search that builds this run frequently to minimize the amount of data processed by each search.
Do the same thing with your MAC<-->user mapping. This would also be an extremely useful lookup to have around for general use.
Finally, schedule a version of your "Step 1" search, modified to use both lookups, and report by user directly from proxy dataset.
Generally speaking, Splunk would rather fight 100 duck sized horses instead of 1 horse sized duck. Running searches like this incrementally over 5 or 10 minute windows and summarizing the result is usually better than trying to run everything all at once.
One pattern to consider for "population" searches like this is to allow for a little indexing lag: earliest=-10m@m latest=-5m@m
and scheduled to run every 5 min.
If you have a really large environment, handling the last step in an accelerated data model is probably a good idea.
Give this a try
index=authentication [search index=dhcp [search index=proxy application=Bittorrent | top limit=20 showperc=f srcip | table srcip | rename srcip as src_ip ] | top limit=20 showperc=f src_mac | table src_mac] | top limit=20 User_Name
Actually, I thought that this wasn't bad.
I downvoted this post because a nested subsearch is never the right answer.