Splunk Search

Why is my dashboard panel search using up so much disk space?

drewg33
Engager

I am having trouble with the search for a dashboard panel. The job is taking up too much of my disk quota (~350MB when run over 24 hour period) and is causing other jobs to queue up because I have exceeded my quota.

Obviously I can increase my disk quota, but I was trying to figure out why this job is such a disk hog in the first place and fix that because from what I can see, it should only be storing 10 rows of a table with a handful of columns each.

Is anyone able to explain why this search would use so much disk space or suggest any improvements?

index="proxylogs" | stats sum(bytes_from_client) as BytesFromClient, distinct_count(client_ip) as DistinctClient by domain | where BytesFromClient > 10000000 AND DistinctClient < 40 | eval Upload(GB)=BytesFromClient/1073741824 | fields domain, Upload(GB) | sort 10 - Upload(GB)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

View solution in original post

somesoni2
Revered Legend

One option could be to use summary indexing to pre-calculate the summary for smallar period, say 1 Hr and then run your query on the summarized data. See more information here.

http://docs.splunk.com/Documentation/Splunk/6.0.5/Knowledge/Usesummaryindexing

https://wiki.splunk.com/Community:Summary_Indexing

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...