Splunk Search

What efficient subsearch options do I have while avoiding the 10k result limit?

hokieb
New Member

**Problem #1**

**

I am struggling to avoid the 10k limit on subsearches within Splunk. I have two data sources and need to "join" them based on the ip address to filter my large login activity event feed by a list of ip addresses I care about. The problem is the list of ip addresses may often be greater than 10k (maybe 20k-100k). The login source data is 20-30 million events (or more) in the time period I need to search and aggregate counts for. The unique IPs in the main login activity may be > 1 million values, and I need to filter that down to the ones in the ip source list.

I have tried 4 different options (really there were many, many more failed attempts!!) for filtering my activity list to the IPs I care about and aggregating the counts I need. Pulling ALL of the login events in and then filtering is very slow / inefficient. The subsearch filter (before first | ) is very efficient, but limiting at 10k.

My normal time ranges for searches are 1 day (either current partial day or previous full day), but I cannot limit to only those ranges. Some needs could be last 4 hours, or 3 days ago, for example.

Are there any ideas on other ways to tackle this problem (or fix any of these below), and get an efficient execution without IP limits? Also, for performance, any suggestions on summarizing in order to improve speed (only summarizing the large login source by IP)? It needs to be accurate, and available pretty quickly. I do plan to make this a base search on a dashboard, with many graphs and extra stats performed on the data in order to provide various insights.

Current search (hits 10k limit):

index=login sourcetype=loginsource_1 activity_name=LOGIN [search index=ip sourcetype=ipsource_1]
| stats   count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4

Option A (very slow due to ALL login activity events needing to be pulled in and also hits a 50k limit on the join subsearch) :

index=login sourcetype=activitysource_1 activity_name=LOGIN | join ipv4 type=inner [search index=ip sourcetype=ipsource_1 ]
| stats   count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4

Option B (no constraints on size of ip list; but very slow due to ALL login activity events needing to be pulled in before filtering it) :

(index=login sourcetype=loginsource_1 activity_name=LOGIN) OR (index=ip   sourcetype=ipsource_1 )
| eval ipv4=coalesce(ipv4, pattern) 
| stats   dc(index) as index_count,  count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4
| search index_count=2

Option C (no constraints on size of ip list; but very slow due to ALL login activity events needing to be pulled in before filtering it):

| multisearch [search index=login sourcetype=activitysource_1 activity_name=LOGIN | eval loginapp_event_time=_time] [search index=ip sourcetype=ipsource_1] 
| stats   dc(index) as index_count,  count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4
| search index_count=2

**

Problem #2


**

To add a bit more complexity to the above, the ip source type needs to take the original earliest time and make it 6 hours earlier in order to pick up all the necessary data (ips on the list expire…we log the create time and when it expires). I know how to extend the range. That is not the problem.

I use this in my search:

[|gentimes start=-1 | addinfo |  eval earliest=info_min_time-21600 | eval latest=info_max_time | table earliest,latest | format "" "" "" "" "" ""]   | addinfo | eval orig_earliest=info_min_time+21600 | convert  mstime(orig_earliest) as PICKER_START | eval LIST_EXIPIRY = strptime(expiretime ,"%FT%T.%3Q%z") | eval DIFF_BL_PICKER=( LIST_EXIPIRY - PICKER_START) | where DIFF_BL_PICKER>=0

But when I use it in a multisearch, and want to ONLY apply it to the one “subsearch”, it applies it reassigns the new range to BOTH the IP search AND the login activity search instead of just the one thread of the multisearch.

| multisearch  [search index=login sourcetype=activitysource_1 activity_name=LOGIN | eval loginapp_event_time=_time]   [search index=ip sourcetype=ipsource_1 [|gentimes start=-1 | addinfo |  eval earliest=info_min_time-21600 | eval latest=info_max_time | table earliest,latest | format "" "" "" "" "" ""]   | addinfo | eval orig_earliest=info_min_time+21600 | convert  mstime(orig_earliest) as PICKER_START | eval LIST_EXIPIRY = strptime(expiretime ,"%FT%T.%3Q%z") | eval DIFF_BL_PICKER=( LIST_EXIPIRY - PICKER_START) | where DIFF_BL_PICKER>=0 ] 
| stats   dc(index) as index_count,  count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4
| search index_count=2

Any ideas on why multisearch is extending the range on the first query too?

Any help you have is appreciated.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try. Add more subsearches to cover max no of results from subsearch

index=login sourcetype=loginsource_1 activity_name=LOGIN ([search index=ip sourcetype=ipsource_1 | stats count by ipv4 | eval sno=0 | accum sno | where sno<10000 | table ipv4]) OR ([search index=ip sourcetype=ipsource_1 | stats count by ipv4 | eval sno=0 | accum sno | where sno>=10000 AND sno<20000 | table ipv4])
 | stats   count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT  by ipv4
0 Karma

hokieb
New Member

I somewhat like the model though I am nervous about maintaining this as my IP volume may increase above my pre-allocated ranges and then need to pay attention to detect it is no longer giving me full results.

0 Karma

sundareshr
Legend

Have you considered using summary index?

0 Karma

hokieb
New Member

Thanks. For the performance aspect, that may be the route I use, or similar. The subsearch/join issue and 10k limit is still a problem but may be feasible with a summary + Option B or C, though my gentimes may still get in the way (problem #2).

0 Karma

sundareshr
Legend

The other option would be to create a lookup (with time reset to -6 hrs) file for your IP list and update that some frequency. In you lookup .csv, add a column say column x. You can then use the following search

search activity_name=LOGIN | lookup iplookup.csv IP OUTPUT filterfield | where filterfiled=*

You will still have to contend with millions of rows returned from LOGIN activity, which perhaps could be summarized.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...