Splunk Search

Why is my dashboard panel search using up so much disk space?

drewg33
Engager

I am having trouble with the search for a dashboard panel. The job is taking up too much of my disk quota (~350MB when run over 24 hour period) and is causing other jobs to queue up because I have exceeded my quota.

Obviously I can increase my disk quota, but I was trying to figure out why this job is such a disk hog in the first place and fix that because from what I can see, it should only be storing 10 rows of a table with a handful of columns each.

Is anyone able to explain why this search would use so much disk space or suggest any improvements?

index="proxylogs" | stats sum(bytes_from_client) as BytesFromClient, distinct_count(client_ip) as DistinctClient by domain | where BytesFromClient > 10000000 AND DistinctClient < 40 | eval Upload(GB)=BytesFromClient/1073741824 | fields domain, Upload(GB) | sort 10 - Upload(GB)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

View solution in original post

somesoni2
Revered Legend

One option could be to use summary indexing to pre-calculate the summary for smallar period, say 1 Hr and then run your query on the summarized data. See more information here.

http://docs.splunk.com/Documentation/Splunk/6.0.5/Knowledge/Usesummaryindexing

https://wiki.splunk.com/Community:Summary_Indexing

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...