Hi Splunkers, today I have a very strange case to manage. I'm going to try right now to be more clear possible. The scenario is a full on prem Splunk Enterprise environment, with many components. F...
See more...
Hi Splunkers, today I have a very strange case to manage. I'm going to try right now to be more clear possible. The scenario is a full on prem Splunk Enterprise environment, with many components. For this customer, we are not the starting provider; another company was on charge before us and developed a full custom app. About this application: No doc has been shared by previous provider It states now some error messages that are not completely clear. So, in a nutshell, we have to try to understand why we got those errors and try to fix them. Now of course I'm not here to ask you "Ehy magic guys, give me the magic solution!"; the purpose of this topic is ask your help to understand data we have (we have only a GUI little dashboard with a short app description and how it works) and try to understand how we can fix those errors. The app analyze Indexers and their indexes. Its purpose is to understand if indexes are retaining the correct amount of historical data; do achieve this, it investigate the index retention status. So, how this investigation is done? The app analyze the currentTimePeriodDay value against the frozenTimePeriodDay. To state if an error is found, the app consider 2 possible cases: currentTimePeriodDay > frozenTimePeriodDay + 45: this case is considered unhealthy because indexes are retaining more historical data than expected currentTimePeriodDay < frozenTimePeriodDay: this case is considered unhealthy because indexes are retaining insufficient historical data. For both cases, the suggested workaround is a generic retention and disk space settings tuning. Of course there are more specific error message for each index on every Indexers (we have a menu to select specific Indexers) but this, by my point of view, is a further analysis step; what is not clear, for my team and me, is the foundation logic of app. I mean: how comparison between currentTimePeriodDay and frozenTimePeriodDay should help us to check a good index retention? How are they related? Why if one of them is greater than the other one, this could be an unhealthy symptom?