Hi All,
I want a SPL query to get total size occupied/consumed by each index till now since the date of onboarding and the remaining space available for each index..
And also please provide a query which can help me to get the expected results if i have searched for one hour/one month/one year it should return the results since the date of onboarding to till now..
Thanks,
Srinivasulu S
There were already some valid points made in this thread but it's worth noting that it's not that simple.
1. Data is rolled over bucket life cycle so - depending on your index ingestion rate and settings - your index space or retention period may allow you to hold only some recent portion of data. You might have already discarded older data.
2. Data is kept in buckets and there is not much sense in trying to go below "resolution" of a bucket. If a bucket contains data from a month ago till a week ago it's impossible to tell how much is used by data from some specific three-day-long period from that time range.
3. Data in distributed environment is rolled separately on each indexer so the "free space" can vary per indexer.
4. There is a lot of different settings responsible for possible index (parts) size.
5. Replication and search factor.
6. Smartstore.
Just a follow up regarding maxTotalDataSizeMB being used in some of these queries that I feel I should raise, the default value for this if unchanged is 500GB, if you have 10 indexes and "filled" them all then you would consume 50TB of storage, but if your storage location is smaller than 50TB (or rather, the total of your maxTotalDataSizeMB) then you will start losing older data *before* the maxTotalDataSizeMB is reached.
Similarly, that value is per-indexer, so if you have 10 indexers then your "total available storage" for each index is 10*maxTotalDataSizeMB (10x500GB = 5TB) - but, if you have 3 copies of the data then you'll chew through that storage quicker, if that makes sense?
Also, if you have frozenTimePeriodInSecs set to 90 days, but after 45 days you've already reached your maxTotalDataSizeMB for that index then you'll never make it to the 90 day retention you're expecting. And the same for the reverse, if maxTotalDataSizeMB is set to 90 days and after 90 days you've used 50GB, you're never going to "fill" it up maxTotalDataSizeMB (unless you start sending more data)!
Essentially what I'm trying to say is, these searches might not give the answer you think you're getting unless other factors are considered [which you might already be aware of - but wanted to hightlight for others who might see these responses 🙂 ]
Hopefully this make sense, but if you want any further guidance on how to apply these searches to your environment then please let us know a bit more about your end-goal here, along with details of your environment (How many IDX, is it a cluster, what is your Replication and search factor, single or multi-site, any other settings changed like TSIDX reduction, presumably not using SmartStore?
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Query 1:
Query to get total size occupied/consumed by each index and remaining space available
| rest /services/data/indexes
| table title currentDBSizeMB maxTotalDataSizeMB
| eval remainingSpaceMB = maxTotalDataSizeMB - currentDBSizeMB
| rename title AS "Index Name", currentDBSizeMB AS "Current Size (MB)", maxTotalDataSizeMB AS "Max Size (MB)", remainingSpaceMB AS "Remaining Space (MB)"
Query 2:
To get the total size occupied by each index since the date of onboarding, you can use the following query
| dbinspect index=*
| stats sum(sizeOnDiskMB) as TotalSizeMB by index
| eval TotalSizeGB = round(TotalSizeMB / 1024, 2)
| table index, TotalSizeGB
Query3:
To find the remaining space available for each index, you can use
| rest /services/data/indexes | table title, currentDBSizeMB, maxTotalDataSizeMB | eval remainingSpaceMB = maxTotalDataSizeMB - currentDBSizeMB | eval remainingSpaceGB = round(remainingSpaceMB / 1024, 2) | table title, remainingSpaceGB
Query 4:
This query gives the total data size consumed per index from the time of onboarding till now (based on _indextime) and remaining space if your Splunk limits are set for each index
| dbinspect index=*
| stats sum(rawSize) AS total_size_in_bytes by index
| eval total_size_in_gb=round(total_size_in_bytes/1024/1024/1024,2)
Query 5:
| dbinspect index=*
| search tsidxState="full" bucketId=*
| eval ageDays=round((endEpoch-startEpoch)/84000,10)
| stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as rawSizeBytes sum(sizeOnDiskMB) as sizeOnDiskBytes values(ageDays) as ageDays dc(bucketId) as countBuckets by index bucketId, state
| where ageDays<90 AND ageDays>0.0000000000
| eval sizeOnDiskBytes=round(sizeOnDiskBytes*pow(1024,2))
| eval dailyDisk=round(sizeOnDiskBytes/ageDays,5)
| eval dailyRaw=round(rawSizeBytes/ageDays,5)
| eval dailyEventCount=round(TotalEvents/ageDays)
| table index bucketId state dailyDisk ageDays rawSizeBytes, sizeOnDiskBytes TotalEvents PercentSizeReduction dailyRaw dailyEventCount ageDays
| stats sum(dailyDisk) as dailyBDiskBucket, values(ageDays), sum(dailyRaw) as dailyBRaw sum(dailyEventCount) as dailyEvent, avg(dailyDisk) as dailyBDiskAvg, avg(dailyRaw) as dailyBRawAvg, avg(dailyEventCount) as dailyEventAvg, dc(bucketId) as countBucket by index, state, ageDays
| eval bPerEvent=round(dailyBDiskBucket/dailyEvent)
| eval bPerEventRaw=round(dailyBRaw/dailyEvent)
| table dailyBDiskBucket index ageDays dailyEvent bPerEvent dailyBRaw bPerEventRaw state
| sort ageDays
| stats sum(dailyBDiskBucket) as Vol_totDBSize, avg(dailyBDiskBucket) as Vol_avgDailyIndexed, max(dailyBDiskBucket) as Vol_largestVolBucket, avg(dailyEvent) as avgEventsPerDay, avg(bPerEvent) as Vol_avgVolPerEvent, avg(dailyBRaw) as Vol_avgDailyRawVol, avg(bPerEventRaw) as Vol_avgVolPerRawEvent, range(ageDays) as rangeAge by index, state
| foreach Vol_* [eval <<FIELD>>=if(<<FIELD>> >= pow(1024,3), tostring(round(<<FIELD>>/pow(1024,3),3))+ " GB", if(<<FIELD>> >= pow(1024,2), tostring(round(<<FIELD>>/pow(1024,2),3))+ " MB", if(<<FIELD>> >= pow(1024,1), tostring(round(<<FIELD>>/pow(1024,2),3))+ " KB", tostring(round(<<FIELD>>)) + " bytes")))]
| rename Vol_* as *
| eval comb="Index Avg/day: " + avgDailyIndexed + "," + "Raw Avg/day: " + avgDailyRawVol + "," + "DB Size: " + totDBSize + "," + "Per Event Avg/Vol: " + avgVolPerEvent + "," + "Retention Range: " + tostring(round(rangeAge))
| eval comb = split(comb,",")
| xyseries index state comb
| table index hot warm cold
I wonder if the following should work for you? Or at least help get some way there, Run this search across "All Time" to get the size since the start of your onboarding.
| dbinspect index=*
| stats sum(sizeOnDiskMB) as sizeOnDiskMB by indexThis gives the sizeOnDisk in MB for each index, the amount of available space might depend on your configuration. Do you have a single storage location/volume for all indexes and both hot/warm/cold buckets?
Do you know the size of your disk or do you need to look in Splunk for this info to? This will help build out a final query for you to use.
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will