Splunk Search

Are successive joins expensive related to disk space quota?

lmonahan
Path Finder

Hi, a question from a high level of what goes on behind the scenes.

I have an internal user who has written lots of handy macros that get chained together. The dashboards leveraging the macros use a base query with panels that continue processing the base query result set. This user is hitting disk quota usage limits that other internal users do not hit.

The macros perform a series of joins and appends along the way with 4 joins not being unusual. I'm wondering if the joins perhaps create multiple copies of the left join for each of any join along the way, requiring more disk space during processing stages even if the end result is "small".  The usage reported in the search does not match the sum total of the usage in the job inspection page so we are not sure what is consuming the space. 

I just ran one example query of the chained macros, broken out to its query form in ad hoc search, and the end result was only 64k events that are small in size (less than 50 characters).

So I guess my question(s) is:

1. Do joins require a lot of disk space usage from the user's quota?

2. Any tips on how to debug end user issues with disk quota usage?

 

Labels (1)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...