Splunk Search

Are successive joins expensive related to disk space quota?

lmonahan
Path Finder

Hi, a question from a high level of what goes on behind the scenes.

I have an internal user who has written lots of handy macros that get chained together. The dashboards leveraging the macros use a base query with panels that continue processing the base query result set. This user is hitting disk quota usage limits that other internal users do not hit.

The macros perform a series of joins and appends along the way with 4 joins not being unusual. I'm wondering if the joins perhaps create multiple copies of the left join for each of any join along the way, requiring more disk space during processing stages even if the end result is "small".  The usage reported in the search does not match the sum total of the usage in the job inspection page so we are not sure what is consuming the space. 

I just ran one example query of the chained macros, broken out to its query form in ad hoc search, and the end result was only 64k events that are small in size (less than 50 characters).

So I guess my question(s) is:

1. Do joins require a lot of disk space usage from the user's quota?

2. Any tips on how to debug end user issues with disk quota usage?

 

Labels (1)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

joins has also some other issues than just use temp space. The most biggest issues are that they have some timeouts/-limits and also there are some limits for results sets especially when those are used in base searches. I suppose that those are leading the situation where all results haven't gotten when you have put all those together.

Probably the easiest way is first start with Job inspector. Use search logs and if you have enough new splunk version there are separate app which you can click from job inspector.

Then there are lot of conf presentations which cover both replacing joins with stats etc. and also how to check and improve query performance.

r. Ismo

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...