Solved: Are successive joins expensive related to disk spa...

lmonahan · ‎08-01-2022

Hi, a question from a high level of what goes on behind the scenes.

I have an internal user who has written lots of handy macros that get chained together. The dashboards leveraging the macros use a base query with panels that continue processing the base query result set. This user is hitting disk quota usage limits that other internal users do not hit.

The macros perform a series of joins and appends along the way with 4 joins not being unusual. I'm wondering if the joins perhaps create multiple copies of the left join for each of any join along the way, requiring more disk space during processing stages even if the end result is "small". The usage reported in the search does not match the sum total of the usage in the job inspection page so we are not sure what is consuming the space.

I just ran one example query of the chained macros, broken out to its query form in ad hoc search, and the end result was only 64k events that are small in size (less than 50 characters).

So I guess my question(s) is:

1. Do joins require a lot of disk space usage from the user's quota?

2. Any tips on how to debug end user issues with disk quota usage?

isoutamo · ‎08-01-2022

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

View solution in original post

isoutamo · ‎08-01-2022

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

Are successive joins expensive related to disk space quota?

join

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability