Splunk Search

problem joining big data

dadi
Path Finder

I've 2 big searches that I need to join. Currently I use this paradigm for joining:

search1 OR search2 | stats by joinField | where filterCondition

It is working fine for normal size searches. The problem begins when I try to join really big data (search1 is 300K results while search2 is 1500K results). I'm start seeing that the results are not complete, and some data that was supposed to get joined is not.

What am I'm doing wrong?
What could be done in splunk to achieve my requirement?
(Initially I used the join sub search method, but limited to 50K results it quickly went out of the question)

Thanks.

Tags (2)
1 Solution

dadi
Path Finder

What appear to be solve it for me was to transfer one of the searches to a lookup file (in a pre processing saved search, see lookup documentation). so now instead of:

search1 OR search2 | stats by joinField | where filterCondition

it is now:

search1 | lookup mycsv joinField

View solution in original post

dadi
Path Finder

What appear to be solve it for me was to transfer one of the searches to a lookup file (in a pre processing saved search, see lookup documentation). so now instead of:

search1 OR search2 | stats by joinField | where filterCondition

it is now:

search1 | lookup mycsv joinField

lguinn2
Legend

Take a look at the limits.conf file - my guess is that you are exceeding at least one of Splunk's limits. For example, I believe that the default limit for the size of search results is 100MB of memory (that's not the count of events, but the total size). You may exceeding that even if you aren't exceeding the maximum event count.

However, expanding the limits may be not be the best solution, even if it is possible. The limits are set to optimize Splunk's behavior...

Consider using summary indexing; this will allow you to accumulate intermediate results over shorter time periods, and then compute the final results as needed. For example, run this search every 5 minutes, saving your results in an index named mysummary:

search1 OR search2 | sistats by joinField 

Assume that this search is named createSummary. Note that it uses the sistats command. After you save the search, you can edit it in the Splunk Manager to set the summary index and the schedule. Once createSummary has been run, you can do the final calculations with this search:

index=mysummary search_name=createSummary | stats by joinField

Of course, my example is pseudo-code, just like yours, but the idea is there. There is also a backfill script that will initialize your summary index over past time periods.

You can look here for more info about summary indexing.

dadi
Path Finder

Thanks for the answer. I'm already using summery indexing. My problem is the data that should be joined is big. Theoretically, I could split it according to different values of the join field, but it seems to be not the ideal solution.
I'm now trying to work with lookup tables, migrating the results of search 2 to reside in a lookup csv file and then my query looks like:
search1 | lookup mycsv joinField

Does this kind of search have scale limits? assuming the csv file is huge.
Thanks again.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...