Splunk Search

Hit limits in the subsearch filters used to boost first pipe search command performance. Alternatives?

alvaromari83
Path Finder

Hello!

We are using many savedsearches to perform daily detection queries over huge datasets. Concretely, the anatomy of our queries is always the same. We have transactions and events related to many IDs (for example, bank account movements, and the ID would be the bank account). So, our searches login wants to find "yesterday's bank accounts that have already done N or more transactions in the past". Therefore, this would be an example of our SPL query:

index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID]
| ... rest of the query

As you can see I'm using a first-pipe-subsearch filter approach to speed up queries and only search for yesterday IDs in the past...

This is nice until you have scenarios with >10000 accountIDs with movements daily. In this case, I've tried to skip subsearch limits by using the format command:

index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID | format max_results=100000]
| ... rest of the query

But this still seems to be overkill. The limits are not hit thanks to format command, but the subsearch just dies and takes too long. Also, I'm not sure if exceding default limits is good practice...

Then I started to think about options WITHOUT subsearches like

index=transactions accountID
| eval isFromYesterday = if( _time>relative_time(now(),"-1d@d") AND _time<relative_time(now(),"-0d@d"), true, false) | ...rest of the query 

But this is still overkill as it will scan the whole index for all accountID without need to do such thing...

Can you share your advice on this issue?

Thank you in advance!!

Regards,

Álvaro

pdaigle_splunk
Splunk Employee
Splunk Employee

To me, this just calls out for the use of Summary Indexing instead of trying to do it via other means:

http://docs.splunk.com/Documentation/Splunk/7.1.1/Knowledge/Usesummaryindexing

Have you looked into this or attempted to try an approach that uses this?

0 Karma

niketn
Legend

@alvaromari83, since your subquery search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID, needs to actually run only once per day, can you save the same as daily scheduled search after midnight and push the accountIDs to a lookup file using outputlookup command?

This way you would need to run only the main search against your index and your sub search (in the base query will change to inputlookup command.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

alvaromari83
Path Finder

Hello! This then would be something like:

index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ]
 | ... rest of the query

This approach would for sure save some time in the subsearch filter, but keeps having the issue of using a subsearch, hitting the default 10,000 limit. So, format command with much higher limit size would be required... and then, the format parsing for the outer search it muuuch slower.

I've made some test with a csv of 10000 accountIDs, and with:

index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ]
 | stats dc(accountID)

takes 55 minutes!!!, while this full scan approach:

index=transactions | lookup dailyaccountIDs.csv accountID OUTPUT isdaily | search isdaily=1
 | stats dc(accountID)

takes 75 seconds, which is the same time that a raw fullscan takes:

index=transactions | stats dc(accountID)

So, not big improvement...

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...