Knowledge Management

How old is the data I am keeping?

robertlynch2020
Influencer

Hi

A team has asked me that they need to keep 3 months' Data.

I have told them that we have limited space on the discs and it is a function of Data, not Time. (I know an index gets full and data will drop off at the back)

But how do I know how far my data goes back, so I can judge if I need to get more Disk space, or increase the size of the index.

Below is a screen show taken from 1 of the indexes, however, I don't believe the reading. I know that I keep up to 3-6 months of data but not years.

So 

robertlynch2020_0-1658420489804.png

If I try running a search like this I don't think it will end. Also, I don't believe the data in 2017, I think somehow it got back dated

robertlynch2020_1-1658420839654.png

Anu help would be great - cheers

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I  order to give a reasonable answer to this question you must understand how indexes work.

Splunk works on events groupped into so-called buckets. Index contains one or more buckets. I won't bore you at this point with bucket's lifecycle (hot->warm->cold->frozen/delete). What's important is that splunk will age whole buckets out. So if the index exceeds its size limits, the oldest bucket (the one where latest event is oldest) will get frozen/deleted. Otherwise the housekeeping thread will every now and then check the _whole bucket's_ age (the age of latest event in a given bucket) and will freeze/delete it if it exceeds the retention period.

So you can see that it's perfectly ok to have events way older than your retention period if the oldest bucket contains events younger than retention period. In such case the whole bucket will not get rolled out to frozen/deleted until the latest events from this indexage sufficiently.

You can check buckets for your index using

| dbinspect index=my_index

Use "All time" in time picker or "earliest=0" to list all buckets.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...