Splunk Search

Why is my dashboard panel search using up so much disk space?

drewg33
Engager

I am having trouble with the search for a dashboard panel. The job is taking up too much of my disk quota (~350MB when run over 24 hour period) and is causing other jobs to queue up because I have exceeded my quota.

Obviously I can increase my disk quota, but I was trying to figure out why this job is such a disk hog in the first place and fix that because from what I can see, it should only be storing 10 rows of a table with a handful of columns each.

Is anyone able to explain why this search would use so much disk space or suggest any improvements?

index="proxylogs" | stats sum(bytes_from_client) as BytesFromClient, distinct_count(client_ip) as DistinctClient by domain | where BytesFromClient > 10000000 AND DistinctClient < 40 | eval Upload(GB)=BytesFromClient/1073741824 | fields domain, Upload(GB) | sort 10 - Upload(GB)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

View solution in original post

somesoni2
Revered Legend

One option could be to use summary indexing to pre-calculate the summary for smallar period, say 1 Hr and then run your query on the summarized data. See more information here.

http://docs.splunk.com/Documentation/Splunk/6.0.5/Knowledge/Usesummaryindexing

https://wiki.splunk.com/Community:Summary_Indexing

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...