First of all, thank you very much for your time for reading my following question,
I am a Splunk newbie, an instruction has been given to us to, "Determine how much data is being stored daily vs how much is aging out"
We have a clustered Indexers environment.
we’re approaching our disk capacity
We are aging out data reaching its retention period so while we’re indexing new data, we are throwing away old but since we have added expanded datasources over the last year, we are most likely ingesting more than we’re aging out.
my question is:
How to determine how much data is being stored daily vs how much is aging out?
This should give us an idea of how much runway we have.
to check how much data is being committed to disk daily, use the time picker and search the following:
| dbinspect index=*
| stats sum(sizeOnDiskMB) as daily_mb_to_disk
if you dont roll to frozen, it can be a little tricky to track the sum of data in buckets that are rolling out.
you can try and calculate your disk size at the beginning of a day and at the end of a day and use the total size you discovered earlier to see how much data rolled out:
example: total disk size at 4/4 00:00:00 is 500GB total data used at that time is 400GB
total data committed on 4/4 is 50GB.
when checking the total disk at 4/5 00:00:00 is 425 -> you rolled out 25GB
hope it helps
p.s. try and look for the Fire Brigade app, it might have better insights, also, try and use data in the _introspection index.