Getting Data In

How to figure out index disk space footprint on indexers?

Path Finder

I was struggling to find short and long term estimations on how much space was taken by each index in each state, so if you are trying to make a plan or taking over an older deployment your 2 friends are dbinspect and the Monitoring Console. Seriously try to avoid internal metrics with splunkd unless you are looking at license volume.

Monitoring Console deserves a course on it's own, but using dbinspect, I was able to find - by bucket and state - volumes compressed and uncompressed, while figuring out a decent estimation of how I should configure my indexes.conf.

The search used (which desperately needs cleaning, as it has plenty of unnecessary stats tables written into it):

| dbinspect index=* 
| search tsidxState="full" bucketId=*
    | eval ageDays=round((endEpoch-startEpoch)/84000,10)
| stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as rawSizeBytes sum(sizeOnDiskMB) as sizeOnDiskBytes values(ageDays) as ageDays dc(bucketId) as countBuckets by index bucketId, state 
    | where ageDays<90 AND ageDays>0.0000000000 
    | eval sizeOnDiskBytes=round(sizeOnDiskBytes*pow(1024,2))
    | eval dailyDisk=round(sizeOnDiskBytes/ageDays,5)
    | eval dailyRaw=round(rawSizeBytes/ageDays,5)
    | eval dailyEventCount=round(TotalEvents/ageDays)
| table index bucketId state dailyDisk ageDays rawSizeBytes, sizeOnDiskBytes TotalEvents PercentSizeReduction dailyRaw dailyEventCount ageDays
| stats sum(dailyDisk) as dailyBDiskBucket, values(ageDays), sum(dailyRaw) as dailyBRaw sum(dailyEventCount) as dailyEvent, avg(dailyDisk) as dailyBDiskAvg, avg(dailyRaw) as dailyBRawAvg, avg(dailyEventCount) as dailyEventAvg, dc(bucketId) as countBucket by index, state, ageDays
    | eval bPerEvent=round(dailyBDiskBucket/dailyEvent)
    | eval bPerEventRaw=round(dailyBRaw/dailyEvent)
| table dailyBDiskBucket index ageDays dailyEvent bPerEvent dailyBRaw bPerEventRaw state
    | sort ageDays
| stats sum(dailyBDiskBucket) as Vol_totDBSize, avg(dailyBDiskBucket) as Vol_avgDailyIndexed, max(dailyBDiskBucket) as Vol_largestVolBucket, avg(dailyEvent) as avgEventsPerDay, avg(bPerEvent) as Vol_avgVolPerEvent, avg(dailyBRaw) as Vol_avgDailyRawVol, avg(bPerEventRaw) as Vol_avgVolPerRawEvent, range(ageDays) as rangeAge by index, state
    | foreach Vol_* [eval <<FIELD>>=if(<<FIELD>> >= pow(1024,3), tostring(round(<<FIELD>>/pow(1024,3),3))+ " GB", if(<<FIELD>> >= pow(1024,2), tostring(round(<<FIELD>>/pow(1024,2),3))+ " MB", if(<<FIELD>> >= pow(1024,1), tostring(round(<<FIELD>>/pow(1024,2),3))+ " KB", tostring(round(<<FIELD>>)) + " bytes")))]
    | rename Vol_* as *
    | eval comb="Index Avg/day: " + avgDailyIndexed + "," + "Raw Avg/day: " + avgDailyRawVol + "," + "DB Size: " + totDBSize + "," + "Per Event Avg/Vol: " + avgVolPerEvent + "," + "Retention Range: " + tostring(round(rangeAge))
    | eval comb = split(comb,",")
| xyseries index state comb
| table index hot warm cold 

This search helped a lot in knowing where to move forward in configuration changes. Hopefully this helps you avoid the trip into wonderland. The main are you want to look is Index Avg/day that's the compressed value; what is written to disk.

1 Solution

Path Finder

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

View solution in original post

Path Finder

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...