Splunk Search

Best approach for a search with a lookup

a212830
Champion

Hi,

I have been asked to create a search (and then a report) that shows vpn logins for the last XX minutes (probably going back 1- 4 hours). I have that search - it returns 4 fields, and it could return a few thousand rows (maybe, 3-5). I then have to merge the login id with a lookup file that already exists with the individuals location, bu.... and then present that info in trends.

My question is: what is the better approach - reading the input file and then using Splunk to extract the relevant login id's and additional information, or searching Splunk first, get the login id's, and then search the input lookup. (Are both of these even possible?) The lookup file is about 75k rows, and 150 fields (of which, I only want 10).

Hope this makes sense.

TIA...

0 Karma
1 Solution

Jeremiah
Motivator

The most straightforward way to accomplish this search would be to find your matching events and perform your lookup. For example, lets say your sourcetype is vpn and your username is in the user field:

sourcetype=vpn | lookup your_lookup user OUPUT location, bu, etc. | stats count by location, bu, etc.

Since you have 150 fields, you should list out just the fields you actually will work with in your OUTPUT statement, rather than pulling the entire set. This search works well if you want to report on all, or nearly all of the VPN events you have. A slight optimization would be to do the lookup AFTER some consolidation:

sourcetype=vpn | stats count by user | lookup your_lookup user OUPUT location, bu, etc. | stats sum(count) by location, bu, etc.

Now, if you have say, thousands of users with activity, but only want to report on a small subset, you could start the search with a subsearch to narrow down your matching events. This search would only find users whose location is "USA".

sourcetype=vpn [ | inputlookup your_lookup | search location="USA" | fields user] | ....

However, given that your result set is only maybe 3-5k rows, performing the subsearch probably won't get you much. And, you run the risk of hitting a limit if the subsearch returns too many field values (ie, if more than 50k users are in USA). If I have a search that returns millions of rows, and I know I really only want a small subset of those based on a value from my lookup data, I might use a subsearch to narrow the list down. Otherwise, I'd perform the first search. Its much simpler.

View solution in original post

Jeremiah
Motivator

The most straightforward way to accomplish this search would be to find your matching events and perform your lookup. For example, lets say your sourcetype is vpn and your username is in the user field:

sourcetype=vpn | lookup your_lookup user OUPUT location, bu, etc. | stats count by location, bu, etc.

Since you have 150 fields, you should list out just the fields you actually will work with in your OUTPUT statement, rather than pulling the entire set. This search works well if you want to report on all, or nearly all of the VPN events you have. A slight optimization would be to do the lookup AFTER some consolidation:

sourcetype=vpn | stats count by user | lookup your_lookup user OUPUT location, bu, etc. | stats sum(count) by location, bu, etc.

Now, if you have say, thousands of users with activity, but only want to report on a small subset, you could start the search with a subsearch to narrow down your matching events. This search would only find users whose location is "USA".

sourcetype=vpn [ | inputlookup your_lookup | search location="USA" | fields user] | ....

However, given that your result set is only maybe 3-5k rows, performing the subsearch probably won't get you much. And, you run the risk of hitting a limit if the subsearch returns too many field values (ie, if more than 50k users are in USA). If I have a search that returns millions of rows, and I know I really only want a small subset of those based on a value from my lookup data, I might use a subsearch to narrow the list down. Otherwise, I'd perform the first search. Its much simpler.

a212830
Champion

Thanks much! Appreciate it.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...