Hi Splunkers, today I have a very strange case to manage. I'm going to try right now to be more clear possible.
The scenario is a full on prem Splunk Enterprise environment, with many components.
For this customer, we are not the starting provider; another company was on charge before us and developed a full custom app. About this application:
So, in a nutshell, we have to try to understand why we got those errors and try to fix them.
Now of course I'm not here to ask you "Ehy magic guys, give me the magic solution!"; the purpose of this topic is ask your help to understand data we have (we have only a GUI little dashboard with a short app description and how it works) and try to understand how we can fix those errors.
The app analyze Indexers and their indexes. Its purpose is to understand if indexes are retaining the correct amount of historical data; do achieve this, it investigate the index retention status. So, how this investigation is done? The app analyze the currentTimePeriodDay value against the frozenTimePeriodDay. To state if an error is found, the app consider 2 possible cases:
For both cases, the suggested workaround is a generic retention and disk space settings tuning.
Of course there are more specific error message for each index on every Indexers (we have a menu to select specific Indexers) but this, by my point of view, is a further analysis step; what is not clear, for my team and me, is the foundation logic of app.
I mean: how comparison between currentTimePeriodDay and frozenTimePeriodDay should help us to check a good index retention? How are they related? Why if one of them is greater than the other one, this could be an unhealthy symptom?
Hi
I suppose that you means that the currentTimePeriodDay is the oldest data what you have on bucket before it has moved to frozen state.
I suppose that this apps use those two values to check how well date retention is working. I expecting that you are familiar how data has stored on splunk bucket and which all parameters need to take count when real retention (remove bucket and events) will happened? If not then there are couple of old post where we have discussed this challenge. Also there is a good .conf presentation about it https://conf.splunk.com/files/2017/slides/splunk-data-life-cycle-determining-when-and-where-to-roll-...
I suppose that you could use this apps and those limits to fine-tune needed parameters in indexes.conf file to ensure that your real event retention time is as close as possible what you have defined in indexes.conf.
r. Ismo
Hi
I suppose that you means that the currentTimePeriodDay is the oldest data what you have on bucket before it has moved to frozen state.
I suppose that this apps use those two values to check how well date retention is working. I expecting that you are familiar how data has stored on splunk bucket and which all parameters need to take count when real retention (remove bucket and events) will happened? If not then there are couple of old post where we have discussed this challenge. Also there is a good .conf presentation about it https://conf.splunk.com/files/2017/slides/splunk-data-life-cycle-determining-when-and-where-to-roll-...
I suppose that you could use this apps and those limits to fine-tune needed parameters in indexes.conf file to ensure that your real event retention time is as close as possible what you have defined in indexes.conf.
r. Ismo