Hi,
I’m doing storage dimensioning for our Indexer cluster as follows
Inputs:
- number of log events ingested per day, and
- Average size of each log events
Output:
- how much the disk space of $SPLUNK_DB has increase in 1 day
Previously, in order the obtain the delta in diskspace, I simply took 2 snapshots 24 hrs apart. But now that our data has reached retention age, with oldest data getting deleted everyday, I can no longer do that.
I’ve tried Fire Brigade TA, but it didn’t give me what I need. So, I’m down to 2 options:
- asking our customer to temporary increase the retention time by a few days so that the logs don’t get truncated, or
- manually searching for all buckets having data within the 1-day time range and find their size
Would anyone have gone through this exercise and found a simpler way to obtain this estimation?
Thanks,
Jennie
You can actually reference the license usage logs for this:
index=_internal source=*license_usage.log type=Usage earliest=-1d@d latest=-0d@d | stats sum(b) by st, idx | rename sum(b) as Bytes | eval Volume=round(Bytes/1024/1024, 2) | eventstats sum(Volume) as Total_Volume | fields - Bytes | fieldformat Volume = tostring(Volume, "commas") +"MB" | fieldformat Total_Volume = tostring(Total_Volume, "commas") +"MB" | rename st as Sourcetype, idx as Index
Actually, I'm not looking for the ingested log volume per day, but the disk space consumption on the indexer cluster, meaning the increase in these folders:
- $SPLUNK_DB//db
- $SPLUNK_DB//datamodel_summary
In our deployment, we have replication-factor=2, search-factor=2 and we use data model acceleration, so the actual disk space usage is quite different from the ingested log volume. From my experience, when upgrading Splunk version, I've sometime seen a substantial change in the ratio of the log volume and log storage, hence, the need to revise the dimentioning tool from time to time...