Splunk Search

Search to Provide Days in Hot/WarmDB

jodros
Builder

In our environment, we have a CIFS share that is used to store all colddb. Warm is rolled to cold when the hot/warm volume hits 200GB.

maxVolumeDataSizeMB = 204800

I know how much data is being stored locally on the indexers versus the CIFS, but I would like to know how many days worth of data is stored locally on the indexers. I am about to do some basic calculations looking at what we typically index in a day, but that does not factor in the compression that Splunk does.

I was wondering if there is a search that could be constructed that could show the oldest timestamp of data in the warmdb. That would be exactly what I need. The reason I need this is to appease our auditing department which is not comfortable not knowing the exact days stored locally versus on CIFS.

Thanks in advance.

Tags (2)
0 Karma
1 Solution

jodros
Builder

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

View solution in original post

Hdholariya
Explorer

Here is the search to find out approx. how many days data is stored in hot/warm buckets and then it moved to cold buckets.

 

index=_internal sourcetype=splunkd component=HotBucketRoller "finished moving hot to warm" 
| rename to as bucket_name 
| append 
    [| search index=_internal sourcetype=splunkd "BucketMover" "Done move warm to cold" 
    | rex field=_raw "bucket=(?<bucket_name>[^\s]*)\s" ] 
| stats earliest(_time) as start latest(_time) as end by idx bucket_name 
| eval diff_in_days = (end -start)/(60*60*24) 
| where diff_in_days > 0  ```Ignoring buckets still not converted to cold```
| stats avg(diff_in_days ) as avg_days_data_stored_in_hot_storage by idx
0 Karma

jodros
Builder

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

lpolo
Motivator

I would recommend to use |dbinspect See the example presented in the next answer.

0 Karma

lpolo
Motivator

you can use:

|dbinspect index=name_of_your_index state=warm

Example for the main index:

|dbinspect index=main state=warm

Make sure to select "all time" as your time period in your query.

for more information take a look at the dbinspect command.

Once, you are familiar with the dbinspect command, you would be able to use it to obtain the result set you are looking for. This is an example:

|dbinspect index=main|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(earliestTime) as earliestTime|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(latestTime) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_seconds=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The result set for this query is something like this:

state   earliestTime        latestTime      sizeOnDiskMB    NumberOfBuckets
hot 09/15/2006:18:47:20 03/24/2012:12:00:00 10043.741711    10
warm    02/19/2012:00:00:01 03/21/2012:03:59:00 92646.695278    20
cold    01/03/2006:16:35:20 02/27/2012:17:00:00 204971.245710   586

I prefer to use the dbinspect command. It offers precise information about the state of the buckets found in each index.

Regards,
Lp

gjanders
SplunkTrust
SplunkTrust

For anyone using Splunk 6.4.x or the newer Splunk 6.x series the query changes slightly from lpolo's original answer, the updated query is:
|dbinspect index=main| convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(startEpoch) as earliestTime|convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(endEpoch) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_hours=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The changes are the startEpoch and endEpoch replace the earliestTime and latestTime fields.

David
Splunk Employee
Splunk Employee

At least as of 6.0, you can run dbinspect from a search head and it will hit every indexer.

lpolo
Motivator

You have to run the command locally. You can create a script to collect this data or login each indexer to run the command via its web interface.

0 Karma

jodros
Builder

We have a distributed environment with 4 indexers. Is dbinspect able to report in a distributed environment? I have read that it is not.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

Index This | When is October more than just the tenth month?

October 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...