Solved: Search to Provide Days in Hot/WarmDB

jodros · ‎03-21-2012

In our environment, we have a CIFS share that is used to store all colddb. Warm is rolled to cold when the hot/warm volume hits 200GB.

maxVolumeDataSizeMB = 204800

I know how much data is being stored locally on the indexers versus the CIFS, but I would like to know how many days worth of data is stored locally on the indexers. I am about to do some basic calculations looking at what we typically index in a day, but that does not factor in the compression that Splunk does.

I was wondering if there is a search that could be constructed that could show the oldest timestamp of data in the warmdb. That would be exactly what I need. The reason I need this is to appease our auditing department which is not comfortable not knowing the exact days stored locally versus on CIFS.

Thanks in advance.

jodros · ‎03-21-2012

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

View solution in original post

Hdholariya

Here is the search to find out approx. how many days data is stored in hot/warm buckets and then it moved to cold buckets.

index=_internal sourcetype=splunkd component=HotBucketRoller "finished moving hot to warm" 
| rename to as bucket_name 
| append 
    [| search index=_internal sourcetype=splunkd "BucketMover" "Done move warm to cold" 
    | rex field=_raw "bucket=(?<bucket_name>[^\s]*)\s" ] 
| stats earliest(_time) as start latest(_time) as end by idx bucket_name 
| eval diff_in_days = (end -start)/(60*60*24) 
| where diff_in_days > 0  ```Ignoring buckets still not converted to cold```
| stats avg(diff_in_days ) as avg_days_data_stored_in_hot_storage by idx

jodros · ‎03-21-2012

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

lpolo · ‎03-22-2012

I would recommend to use |dbinspect See the example presented in the next answer.

lpolo · ‎03-21-2012

you can use:

|dbinspect index=name_of_your_index state=warm

Example for the main index:

|dbinspect index=main state=warm

Make sure to select "all time" as your time period in your query.

for more information take a look at the dbinspect command.

Once, you are familiar with the dbinspect command, you would be able to use it to obtain the result set you are looking for. This is an example:

|dbinspect index=main|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(earliestTime) as earliestTime|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(latestTime) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_seconds=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The result set for this query is something like this:

state   earliestTime        latestTime      sizeOnDiskMB    NumberOfBuckets
hot 09/15/2006:18:47:20 03/24/2012:12:00:00 10043.741711    10
warm    02/19/2012:00:00:01 03/21/2012:03:59:00 92646.695278    20
cold    01/03/2006:16:35:20 02/27/2012:17:00:00 204971.245710   586

I prefer to use the dbinspect command. It offers precise information about the state of the buckets found in each index.

Regards,
Lp

gjanders · ‎07-04-2016

For anyone using Splunk 6.4.x or the newer Splunk 6.x series the query changes slightly from lpolo's original answer, the updated query is:
|dbinspect index=main| convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(startEpoch) as earliestTime|convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(endEpoch) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_hours=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The changes are the startEpoch and endEpoch replace the earliestTime and latestTime fields.

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

David · ‎02-04-2014

At least as of 6.0, you can run dbinspect from a search head and it will hit every indexer.

lpolo · ‎03-21-2012

You have to run the command locally. You can create a script to collect this data or login each indexer to run the command via its web interface.

jodros · ‎03-21-2012

We have a distributed environment with 4 indexers. Is dbinspect able to report in a distributed environment? I have read that it is not.

Search to Provide Days in Hot/WarmDB

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

Are you a member of the Splunk Community?

Search to Provide Days in Hot/WarmDB

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk