Getting Data In

How to find the index footprint by hot, cold, and frozen?

Explorer

Good morning those more knowledgeable than myself 🙂

The index usage default panel which shows such useful information as earliest event, is not quite giving me what I need.

Trying to manage Hot/warm, cold and frozen in such a way as 60% of the data is on hot/warm, 40% of the data is on cold and anything older than 115 days (we promise 90 searchable) goes to frozen.

The frozen data is included in the earliest event calculation, and I'd like to either see my footprint in size for only hot/cold or for all three so I can calculate the hot/cold ratio without the blur introduced by including frozen.

Ideally I'd like to have the ability to control hot/cold retention entirely by the date range of the data, not the size of it but that seems to be impossible to do directly hence calculating it. Now having turned frozen on, the data we were using to size the indexes is being made fuzzy by including the frozen data in the earliest event count.

So what particular Splunk incantation is needed to parse the index footprint data out like that?

-J

SplunkTrust
SplunkTrust

Option 1
Use Splunk on Splunk (SoS App) or Distributed Management Console (called Management Console in Splunk 6.5) to validate Indexing bucket performance and stats. DMC should have optimized searches which you can run to check out yourself.

Option 2
Use Splunk generating command dbinspect.

 | dbinspect index=_internal | search tsidxState="full" | stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime  max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as TotalRawDataSizeMB sum(sizeOnDiskMB) as TotalDiskDataSizeMB by state | eval TotalRawDataSizeMB =round((TotalRawDataSizeMB/1024/1024),6) | eval MinStartTime=strftime(MinStartTime,"%Y/%m/%d %H:%M:%s") | eval MaxStartTime=strftime(MaxStartTime,"%Y/%m/%d %H:%M:%s")  | eval MinEndTime=strftime(MinEndTime,"%Y/%m/%d %H:%M:%s") | eval MaxEndTime=strftime(MaxEndTime,"%Y/%m/%d %H:%M:%s") | eval PercentSizeReduction=round(((TotalRawDataSizeMB-TotalDiskDataSizeMB)/TotalRawDataSizeMB)*100,2)

Refer to Splunk documentation on DBinspect to come up with query that you need.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Dbinspect

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Splunk Employee
Splunk Employee

I'd also recommend FireBrigade for this. Has a lot more information regarding in-flight buckets and status then the current DMC does.

0 Karma

Ultra Champion

As part of indexes.conf

you have a nice number of configuration parameters to control the buckets, such as -

maxHotBuckets
homePath.maxDataSizeMB 
frozenTimePeriodInSecs
maxWarmDBCount

Hopefully, it can assist you.

Explorer

Yes. I'm aware of all those parameters. What I was shooting for was an alternative from the -->settings-->data-->indexes dialog where you see things like current size, earliest event etc. This is one of the few things where there is no 'open in search' button.

When you sort on earliest event, it appears you get events that are included in frozen. I'm trying to know just the event date ranges for hot/warm/cold.

I have a retention policy of 90 days searchable (delivering 120) and remainder of year archive (275 days but delivering 295).

Since Splunk index sizes are determined by data volume and not event age, I need know the earliest event in my indexes that does not include frozen as a sort of retention audit.

Now, however, I've seen a 42% increase in my index usage all in firewall traffic, so monitoring the earliest event in hot/warm/cold is even more of a concern while we nail down the cause of the spike.

0 Karma

Explorer

I should be a bit more clear. If seeing footprint for hot/cold/frozen I'd like to see the size of the data on hot/cold/frozen and not an aggregate number.

0 Karma