Splunk Search

How do I improve the performance of my three searches?

gwalford
Path Finder

I have a pretty inefficient search that I would like to improve - it has a lot of flaws. For example, I run the three searches below in order, one after the other. What I should be doing is pulling out a time value in the first search, and running them all concurrently, but the below is the closest I have been able to get to work.

The intent behind this search is to find people running bittorrent at the proxy, compare the natted IP with the DHCP logs to find the MAC address, then check with the authentication servers logs to see who had that IP during that time range.

Step 1)

index=proxy  application=Bittorrent | top limit=20 srcip | table srcip | outputlookup bittorrent_list.csv

Step 2)

index=dhcp [|inputlookup bittorrent_list.csv | rename srcip as src_ip] | top limit=20 src_mac | table src_mac | outputlookup bittorrent_mac_from_dhcp_list.csv

Step 3)

index=authentication  [|inputlookup bittorrent_mac_from_dhcp_list.csv ]| top limit=20 User_Name
0 Karma

jacobwilkins
Communicator

First, check out this info on creating incrementing time-based lookups: https://conf.splunk.com/session/2015/conf2015-LookupTalk.pdf

Apply that logic to your DHCP data to have a reliable way of mapping IP to MAC addresses, which is a super-handy lookup to have around for general use. Have the scheduled search that builds this run frequently to minimize the amount of data processed by each search.

Do the same thing with your MAC<-->user mapping. This would also be an extremely useful lookup to have around for general use.

Finally, schedule a version of your "Step 1" search, modified to use both lookups, and report by user directly from proxy dataset.

Generally speaking, Splunk would rather fight 100 duck sized horses instead of 1 horse sized duck. Running searches like this incrementally over 5 or 10 minute windows and summarizing the result is usually better than trying to run everything all at once.

One pattern to consider for "population" searches like this is to allow for a little indexing lag: earliest=-10m@m latest=-5m@m and scheduled to run every 5 min.

If you have a really large environment, handling the last step in an accelerated data model is probably a good idea.

somesoni2
Revered Legend

Give this a try

index=authentication [search index=dhcp [search index=proxy application=Bittorrent | top limit=20 showperc=f srcip | table srcip | rename srcip as src_ip ] | top limit=20 showperc=f src_mac | table src_mac] | top limit=20 User_Name

gwalford
Path Finder

Actually, I thought that this wasn't bad.

0 Karma

jacobwilkins
Communicator

I downvoted this post because a nested subsearch is never the right answer.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...