Greetings all,
Issue: Space on server exhausted, primarily in folder $SPLUNK_HOME/var/run/splunk/srtemp
Splunk version: v4.2.5
OS / version: RedHat Enterprise Linux v.6.2
Steps to replicate:
using an app (the 'splunk for f5 (beta)' app (v0.2) in this case), with dashboard views of more than a few charts.
extend time range to more than a small amount (a few days),
splunk runs out of disk space in a very short time period.
Further Diagnosis:
I don't believe this is a problem specific to the app
The app collects data from the indexer, slowly accumulating data in the $SPLUNK_HOME/var/run/splunk/dispatch folder. This process obeys user quotas.
This continues for some time, then when all the data is collected (or the quota is hit), the dashboards / graphs begin to generate
When the 10 dashboard widgets start to populate, splunk starts filling up the 'srtemp' / working directory with calculations.
These are populated in parallel, and grow to be very large (each one takes about 1Gb per day's worth of data being crunched in our case),
so (for example) 10 days of history takes (10days X 1Gb) = 10Gb in under 5 mins.
I believe that this result could be replicated by any dashboard that's intensive enough, so I don't think it's specific to the app - I think that it's a problem with Splunk.
Also, this problem won't really be solved by leaving some amount of overhead, as it will be trivial for a normal user to run the server out of space by doing the following:
Generate a dashboard with an arbitrary amount of charts working off the same dataset
load the dashboard
their dispatch directory will fill up to the quota (e.g. 100M), which helps limit the total size, but the 'srtemp' space will fill up depending on how many charts and how
complex they are.
I've submitted a support case regarding this (86131),
the response has been:
Currently we don't have any parameter for the limitation of the size of srtemp.
The reason is that we don't know how big is the result might be, to
limit the size of temp folder will cause in-complete search result.
I suggest to leave at least 2GB for the temporary usage.
While I appreciate the response,
for our use case it's trivial for the user to fill up the available space, including any amount of reserve (2GB or upwards) by clicking on the wrong time range, which will end up presenting incomplete search results as well as crashing the server...
I've submitted an enhancement request as part of the same case to implement some kind of per-user quota that is applied to working/temporary space,
but I was wondering if anyone else had come across this problem,
and if so,
how they were dealing with it.
Any similar experiences?
... View more