Splunk Search
Highlighted

Hit limits in the subsearch filters used to boost first pipe search command performance. Alternatives?

Path Finder

Hello!

We are using many savedsearches to perform daily detection queries over huge datasets. Concretely, the anatomy of our queries is always the same. We have transactions and events related to many IDs (for example, bank account movements, and the ID would be the bank account). So, our searches login wants to find "yesterday's bank accounts that have already done N or more transactions in the past". Therefore, this would be an example of our SPL query:

index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID]
| ... rest of the query

As you can see I'm using a first-pipe-subsearch filter approach to speed up queries and only search for yesterday IDs in the past...

This is nice until you have scenarios with >10000 accountIDs with movements daily. In this case, I've tried to skip subsearch limits by using the format command:

index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID | format max_results=100000]
| ... rest of the query

But this still seems to be overkill. The limits are not hit thanks to format command, but the subsearch just dies and takes too long. Also, I'm not sure if exceding default limits is good practice...

Then I started to think about options WITHOUT subsearches like

index=transactions accountID
| eval isFromYesterday = if( _time>relative_time(now(),"-1d@d") AND _time<relative_time(now(),"-0d@d"), true, false) | ...rest of the query 

But this is still overkill as it will scan the whole index for all accountID without need to do such thing...

Can you share your advice on this issue?

Thank you in advance!!

Regards,

Álvaro

Highlighted

Re: Hit limits in the subsearch filters used to boost first pipe search command performance. Alternatives?

Legend

@alvaromari83, since your subquery search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID, needs to actually run only once per day, can you save the same as daily scheduled search after midnight and push the accountIDs to a lookup file using outputlookup command?

This way you would need to run only the main search against your index and your sub search (in the base query will change to inputlookup command.




| eval message="Happy Splunking!!!"


0 Karma
Highlighted

Re: Hit limits in the subsearch filters used to boost first pipe search command performance. Alternatives?

Path Finder

Hello! This then would be something like:

index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ]
 | ... rest of the query

This approach would for sure save some time in the subsearch filter, but keeps having the issue of using a subsearch, hitting the default 10,000 limit. So, format command with much higher limit size would be required... and then, the format parsing for the outer search it muuuch slower.

I've made some test with a csv of 10000 accountIDs, and with:

index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ]
 | stats dc(accountID)

takes 55 minutes!!!, while this full scan approach:

index=transactions | lookup dailyaccountIDs.csv accountID OUTPUT isdaily | search isdaily=1
 | stats dc(accountID)

takes 75 seconds, which is the same time that a raw fullscan takes:

index=transactions | stats dc(accountID)

So, not big improvement...

0 Karma
Highlighted

Re: Hit limits in the subsearch filters used to boost first pipe search command performance. Alternatives?

Splunk Employee
Splunk Employee

To me, this just calls out for the use of Summary Indexing instead of trying to do it via other means:

http://docs.splunk.com/Documentation/Splunk/7.1.1/Knowledge/Usesummaryindexing

Have you looked into this or attempted to try an approach that uses this?

0 Karma