I work at a utility and we have an index that contains SCADA events from the electric system. We have data that goes back to 2015. There are a very large number of total events (1.8 billion or so). I had an engineer trying to trend some voltages over a long time period and it was discovered that Splunk had removed all of the events before 8/1/2020. I cleaned the index and added enableTsidxReduction=false. I then cleaned and reloaded the index and it appears it has removed events prior to Jan 1 2017 this time. The total size of this index is only around 60GB, The SQL database we are loading it from is 100GB total, these events are only two tables. We use DB Connect with a rising column for loading. MSSQL to dedicated SCADA index. Two inputs, one for each table.
I would like for size to be only factor controlling when data leaves the index, I would also prefer for buckets to only be hot and warm, cold is on a much slower storage system and we have plenty of hot/warm space. what is the conf file settings that achieve this?
I have found the spec for indexes.conf and it is very daunting, I have scrolled down through it and it is hard for me to understand what is the right settings to use. Is there a guide somewhere that outlines the behavior and cotnrols for index data management?
We run a distributed system with two indexers on 8.2.3
Thanks for the help.
There are couple of settings which you must change for this. If you don’t use splunk volumes then the next should do the trick.
Please check those names and explanations from https://docs.splunk.com/Documentation/Splunk/8.2.4/Admin/Indexesconf
So I am still seeing the data loss. I was able to find it in _internal as well. Here is my indexes.conf stanza for this index:
# SCADA related indices
homePath = volume:hot/escadahist/db
coldPath = volume:cold/escadahist/colddb
thawedPath = $SPLUNK_DB/escadahist/thaweddb
maxDataSize = auto_high_volume
maxHotBuckets = 10
maxWarmDBCount = 10000
maxTotalDataSizeMB = 500000
enableTsidxReduction = false
After setting the above I clean the index and reset the rising counter in my DBConnect input to reload everything. 45 minutes into reloading I see the following in _internal:
03-02-2022 23:27:01.088 -0500 INFO BucketMover [7660 IndexerService] - idx=escadahist Moving bucket='db_1613378370_1587632768_210', starting warm_to_cold: from='/splunk/var/lib/splunk/escadahist/db' to='/splunkcold/escadahist/colddb, caller='trimVolume', reason='volume size for warm storage exceeded'
And I see the size of the index be reduced, age of data be reduced, it is removing data again.
I have opened a support ticket but they are not helpful. Does "caller='trimVolume'" hold a piece to the puzzle???
Hold the phone. I just found where volume:hot was being defined. We used Splunk services to setup our implementation in 2016 and she created a separate app for volume index management that I have never opened. It has an indexes.conf with only two stanzas for hot and cold volume and only two parameters each. In it, volume:hot maxVolumeDataSizeMB set to 650GB. I have adjusted this to match my free NVMe space on each indexer (7.5TB) and am going to reload. Hoping I finally have found it.
when you are hunting this kind of configuration issues you best friend is btool! Just use "splunk btool indexes list --debug <your index OR volume name>". This shows what are all setting belongs to this entity and where those are defined!