Splunk Search

Best approach for a search with a lookup

a212830
Champion

Hi,

I have been asked to create a search (and then a report) that shows vpn logins for the last XX minutes (probably going back 1- 4 hours). I have that search - it returns 4 fields, and it could return a few thousand rows (maybe, 3-5). I then have to merge the login id with a lookup file that already exists with the individuals location, bu.... and then present that info in trends.

My question is: what is the better approach - reading the input file and then using Splunk to extract the relevant login id's and additional information, or searching Splunk first, get the login id's, and then search the input lookup. (Are both of these even possible?) The lookup file is about 75k rows, and 150 fields (of which, I only want 10).

Hope this makes sense.

TIA...

0 Karma
1 Solution

Jeremiah
Motivator

The most straightforward way to accomplish this search would be to find your matching events and perform your lookup. For example, lets say your sourcetype is vpn and your username is in the user field:

sourcetype=vpn | lookup your_lookup user OUPUT location, bu, etc. | stats count by location, bu, etc.

Since you have 150 fields, you should list out just the fields you actually will work with in your OUTPUT statement, rather than pulling the entire set. This search works well if you want to report on all, or nearly all of the VPN events you have. A slight optimization would be to do the lookup AFTER some consolidation:

sourcetype=vpn | stats count by user | lookup your_lookup user OUPUT location, bu, etc. | stats sum(count) by location, bu, etc.

Now, if you have say, thousands of users with activity, but only want to report on a small subset, you could start the search with a subsearch to narrow down your matching events. This search would only find users whose location is "USA".

sourcetype=vpn [ | inputlookup your_lookup | search location="USA" | fields user] | ....

However, given that your result set is only maybe 3-5k rows, performing the subsearch probably won't get you much. And, you run the risk of hitting a limit if the subsearch returns too many field values (ie, if more than 50k users are in USA). If I have a search that returns millions of rows, and I know I really only want a small subset of those based on a value from my lookup data, I might use a subsearch to narrow the list down. Otherwise, I'd perform the first search. Its much simpler.

View solution in original post

Jeremiah
Motivator

The most straightforward way to accomplish this search would be to find your matching events and perform your lookup. For example, lets say your sourcetype is vpn and your username is in the user field:

sourcetype=vpn | lookup your_lookup user OUPUT location, bu, etc. | stats count by location, bu, etc.

Since you have 150 fields, you should list out just the fields you actually will work with in your OUTPUT statement, rather than pulling the entire set. This search works well if you want to report on all, or nearly all of the VPN events you have. A slight optimization would be to do the lookup AFTER some consolidation:

sourcetype=vpn | stats count by user | lookup your_lookup user OUPUT location, bu, etc. | stats sum(count) by location, bu, etc.

Now, if you have say, thousands of users with activity, but only want to report on a small subset, you could start the search with a subsearch to narrow down your matching events. This search would only find users whose location is "USA".

sourcetype=vpn [ | inputlookup your_lookup | search location="USA" | fields user] | ....

However, given that your result set is only maybe 3-5k rows, performing the subsearch probably won't get you much. And, you run the risk of hitting a limit if the subsearch returns too many field values (ie, if more than 50k users are in USA). If I have a search that returns millions of rows, and I know I really only want a small subset of those based on a value from my lookup data, I might use a subsearch to narrow the list down. Otherwise, I'd perform the first search. Its much simpler.

a212830
Champion

Thanks much! Appreciate it.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...