how can we get the oldest index time of an index ?
Does retention policy depend on indextime or _time ?
The retention policy depends upon the _time values and not the index time. You can run following query find the min/max _time value on a index.
| rest /services/data/indexes | table title frozenTimePeriodInSecs minTime maxTime
Where,
maxTime - ISO8601 format timestamp of the newest event time in the index.
minTime - ISO8601 format timestamp of the oldest event time in the index.
I have an index which has data from two years old and even latest data. how is retention policy applied to that data , that index has a retention of 60 days.
Data for each index is stored in buckets. Each bucket, when it's moved to cold stage (from Hot-> Warm->Cold, from warm onwards it becomes readonly), the retention policy will check the timestamp of newest/start event on the bucket. If the timestamp of newest event is older than retention period (in your case 60 days) from now, then the bucket is rolled to frozen. Else it'll be kept as is. You can run following query for an index to see what all buckets are there and timestamp of oldest (startEpoch) and newest (endEpoch) event.
| dbinspect index=YourIndexNameHere earliest=0 | table index bucketId startEpoch endEpoch | convert ctime(*Epoch) as *HumanReadable
Thanks for the information got a bit clarity,
so the retention period applies only on the cold buckets? What if the warm buckets have data older than 60 days and it has not rolled to cold because it hasn't reached the rolling limit of the bucket.
That is correct, the retention policy only applies to cold buckets. If the warm bucket has data older than retention period, the data will stay there. That also means that your warm to cold bucket rolling configuration may not be adequate for that index (maxWarmDBCount
setting). You may think of lowering this number and/or setting proper hot to warm bucket rolling so that sufficient number of warm buckets are generated (bucket rolles from hot to warm if its size reaches a limit maxDataSize
or its lifetime is older than maxHotSpanSecs
)
Oh that's bad. Is it recommended to take of all the extra settings for hot and warm buckets and just use homepath coldpath and frozenTimePeriodInSecs so that Splunk rolls the buckets itself and thus retention policy works appropriately.
homePath = volume:hotwarm/XXXXXX/db
coldPath = volume:cold/XXXXXX/colddb
thawedPath = $SPLUNK_DB/XXXXXX/thaweddb
homePath.maxDataSizeMB = 10000
coldPath.maxDataSizeMB = 356640
frozenTimePeriodInSecs = 5005600
maxTotalDataSizeMB = 1766400
maxDataSize = auto_high_volume
Suppose I have used the above settings for indexes.conf .. and the data is always less than maxTotalDataSizeMB .. is there a way to tell how long will the data stay searchable if it arrives today .
I believe you should set maxHotSpanSecs (default 90 days) so something smaller, say 7 day and setup maxWarmDBCount smallar so that , for your retention period of 60 days, there will be enough warm buckets rolled. (you'd need to look at current number of warm buckets using the dbinspect command, where column state will give you bucket state hot/warm/cold)
| dbinspect index=YourIndexNameHere earliest=0 | table index bucketId startEpoch endEpoch state | convert ctime(*Epoch) as *HumanReadable
You can go with default Splunk setting, except for maxDataSize attribute which should be set based on amount of data you'll be ingesting in that index. (set it to auto for small ingestions, <750MB/day, else, set it to auto_high_volume). I think part of your problem is timestamp in your data. You might be ingesting a data whose timestamp might be varying a lot (generally ingested data is more close to current time, but you might have old historic data, that's why you see such a big range of timestamp of events in a bucket. Generally the difference is few days, but in your case is upto 2 years.
Ya .. having a hard time finding out how long will the data be retained in Splunk.
Is there a straight forward way to say how long will the data be available if the data comes in today and above settings or any other settings are in use ?
Any calculations that can be done to say the last date of availability ?
With current settings you can't. The reason is that currently there is no fixed setting applied to know when hot bucket will be rolled to warm (and how many). If you apply some configuration say, maxHotSpanSecs=1 day (86400), you'll know that everyday all hot buckets (default count of 3) will roll to warm stage for sure (they can roll before as well based on size or whether splunk restarts), so you know know how many bucket, so daily 3 buckets rolling to warm. If your maxWarmDBCount =180 (default is 300), then you'd know that bucket written into today will roll to cold bucket on 61st day (so data will be available for 61 days).
I am able to see both new data and old data if i do a search on the index. How does retention apply here ?
Here's a quick way to see the earliest and latest events in the index main
:
| metadata type=sources index=main
| stats min(firstTime) AS firstTime max(lastTime) AS lastTime
| convert ctime(firstTime) ctime(lastTime)
I believe retention policy depends on _time
, but I'm having trouble finding documentation to back that up.
Best I can find regarding _time
vs _indextime
for this:
vix.output.buckets.older.than =
* Buckets must be this old before they will be archived.
* A bucket's age is determined by the the earliest _time field of any event in the bucket.
Source: http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Indexesconf
I can't say for sure if that's limited to virtual indexes, though - since that setting certainly is.
Its a bit unclear .. so if a bucket contains both two year old data and todays data ... how will the bucket roll to frozen ?
I have an index which has data from two years old and even latest data. how is retention policy applied to that data , that index has a retention of 60 days.
Retention policy applies to the newest event in a bucket. A bucket will not be retired until every event in that bucket is older than the retention policy date. If a given index doesn't receive a very high volume of data, then it won't retire buckets very often.