**Problem #1**
**
I am struggling to avoid the 10k limit on subsearches within Splunk. I have two data sources and need to "join" them based on the ip address to filter my large login activity event feed by a list of ip addresses I care about. The problem is the list of ip addresses may often be greater than 10k (maybe 20k-100k). The login source data is 20-30 million events (or more) in the time period I need to search and aggregate counts for. The unique IPs in the main login activity may be > 1 million values, and I need to filter that down to the ones in the ip source list.
I have tried 4 different options (really there were many, many more failed attempts!!) for filtering my activity list to the IPs I care about and aggregating the counts I need. Pulling ALL of the login events in and then filtering is very slow / inefficient. The subsearch filter (before first | ) is very efficient, but limiting at 10k.
My normal time ranges for searches are 1 day (either current partial day or previous full day), but I cannot limit to only those ranges. Some needs could be last 4 hours, or 3 days ago, for example.
Are there any ideas on other ways to tackle this problem (or fix any of these below), and get an efficient execution without IP limits? Also, for performance, any suggestions on summarizing in order to improve speed (only summarizing the large login source by IP)? It needs to be accurate, and available pretty quickly. I do plan to make this a base search on a dashboard, with many graphs and extra stats performed on the data in order to provide various insights.
Current search (hits 10k limit):
index=login sourcetype=loginsource_1 activity_name=LOGIN [search index=ip sourcetype=ipsource_1]
| stats count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT by ipv4
Option A (very slow due to ALL login activity events needing to be pulled in and also hits a 50k limit on the join subsearch) :
index=login sourcetype=activitysource_1 activity_name=LOGIN | join ipv4 type=inner [search index=ip sourcetype=ipsource_1 ]
| stats count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT by ipv4
Option B (no constraints on size of ip list; but very slow due to ALL login activity events needing to be pulled in before filtering it) :
(index=login sourcetype=loginsource_1 activity_name=LOGIN) OR (index=ip sourcetype=ipsource_1 )
| eval ipv4=coalesce(ipv4, pattern)
| stats dc(index) as index_count, count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT by ipv4
| search index_count=2
Option C (no constraints on size of ip list; but very slow due to ALL login activity events needing to be pulled in before filtering it):
| multisearch [search index=login sourcetype=activitysource_1 activity_name=LOGIN | eval loginapp_event_time=_time] [search index=ip sourcetype=ipsource_1]
| stats dc(index) as index_count, count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT by ipv4
| search index_count=2
**
Problem #2
**
To add a bit more complexity to the above, the ip source type needs to take the original earliest time and make it 6 hours earlier in order to pick up all the necessary data (ips on the list expire…we log the create time and when it expires). I know how to extend the range. That is not the problem.
I use this in my search:
[|gentimes start=-1 | addinfo | eval earliest=info_min_time-21600 | eval latest=info_max_time | table earliest,latest | format "" "" "" "" "" ""] | addinfo | eval orig_earliest=info_min_time+21600 | convert mstime(orig_earliest) as PICKER_START | eval LIST_EXIPIRY = strptime(expiretime ,"%FT%T.%3Q%z") | eval DIFF_BL_PICKER=( LIST_EXIPIRY - PICKER_START) | where DIFF_BL_PICKER>=0
But when I use it in a multisearch, and want to ONLY apply it to the one “subsearch”, it applies it reassigns the new range to BOTH the IP search AND the login activity search instead of just the one thread of the multisearch.
| multisearch [search index=login sourcetype=activitysource_1 activity_name=LOGIN | eval loginapp_event_time=_time] [search index=ip sourcetype=ipsource_1 [|gentimes start=-1 | addinfo | eval earliest=info_min_time-21600 | eval latest=info_max_time | table earliest,latest | format "" "" "" "" "" ""] | addinfo | eval orig_earliest=info_min_time+21600 | convert mstime(orig_earliest) as PICKER_START | eval LIST_EXIPIRY = strptime(expiretime ,"%FT%T.%3Q%z") | eval DIFF_BL_PICKER=( LIST_EXIPIRY - PICKER_START) | where DIFF_BL_PICKER>=0 ]
| stats dc(index) as index_count, count as TOTAL_COUNT, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED")) as FAILED_COUNT by ipv4
| search index_count=2
Any ideas on why multisearch is extending the range on the first query too?
Any help you have is appreciated.
... View more