Splunk Search

Why is my dashboard panel search using up so much disk space?

drewg33
Engager

I am having trouble with the search for a dashboard panel. The job is taking up too much of my disk quota (~350MB when run over 24 hour period) and is causing other jobs to queue up because I have exceeded my quota.

Obviously I can increase my disk quota, but I was trying to figure out why this job is such a disk hog in the first place and fix that because from what I can see, it should only be storing 10 rows of a table with a handful of columns each.

Is anyone able to explain why this search would use so much disk space or suggest any improvements?

index="proxylogs" | stats sum(bytes_from_client) as BytesFromClient, distinct_count(client_ip) as DistinctClient by domain | where BytesFromClient > 10000000 AND DistinctClient < 40 | eval Upload(GB)=BytesFromClient/1073741824 | fields domain, Upload(GB) | sort 10 - Upload(GB)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

View solution in original post

somesoni2
Revered Legend

One option could be to use summary indexing to pre-calculate the summary for smallar period, say 1 Hr and then run your query on the summarized data. See more information here.

http://docs.splunk.com/Documentation/Splunk/6.0.5/Knowledge/Usesummaryindexing

https://wiki.splunk.com/Community:Summary_Indexing

martin_mueller
SplunkTrust
SplunkTrust

I'm guessing your by domain has very high cardinality, making the temporary search results huge. Solving high-cardinality problems is an inherently hard thing to do. Additionally, check how large the set after the where is, large sorts can also use temporary files. This may be indicated in search.log accessible through the job inspector. To find out what specifically uses up space, check out the contents of $SPLUNK_HOME/var/run/splunk/dispatch/<search id>.

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...