Problem summary:
After Splunk update from 4.2 to 4.2.1, one (and only one) of the indexes started a warm to cold and cold to frozen operation that wiped ALL the data on the index (at the date this process happened)
Since there was no coldToFrozenDir, all the data on the index was lost. After that, new data was indexed correctly.
Context
Splunk instalation with more than 1,000,000 events and about 130Gb of indexed data
Only ONE index was affected, and it had this configuration
HOT/WARM Database:
- maxDataSizeMB = 51200
- maxWarmDBCount = 150
- location: local disk with >50Gb free space
- Hot/Warm path Size went from 38Gb (+100 buckets) to 12Gb (34 buckets)
COLD Database:
- maxDataSizeMB = 46080
- location: network disk with >50Gb free space
- Cold path size went from 31Gb to 1Gb with 3 buckets!!! 😞
The only change that was done on indexes.conf before the restart, was the adjustment to the homePath.maxDataSizeMB
that went from 61440 to 51200
Diagnosis
Right after the 4.2.1 Splunk restart (it was checked against instalation logs and folder dates) the _internal index shows a HUGE number of WARM to COLD and COLD to FROZEN entries for the index
- INFO databasePartitionPolicy
(many entries)
- INFO BucketMover - warm to cold move initiated: <warmm path> to <cold path>
(many entries)
- INFO BucketMover - AsyncFreezer freeze succeeded for <cold bucket path>
- INFO databasePartitionPolicy - Adding <path> because of fullRebuild
- INFO databasePartitionPolicy - rebuildMetadata called: full=true path=<indexpath> reason=frozen buckets
- INFO databasePartitionPolicy - clearing existing internal aggregate metadata (<index path>)
NOTES
here is the output of
splunk cmd btool indexes list <index stanza>
(the coldToFrozenDir was added after I found out the problem :-P)
[portal]
assureUTF8 = false
blockSignSize = 0
blockSignatureDatabase = \_blocksignature
coldPath = D:splunkvarlibsplunkportalcolddb
coldPath.maxDataSizeMB = 46080
coldToFrozenDir = D:splunkvarlibsplunkportalfrozendb
coldToFrozenScript =
compressRawdata = true
defaultDatabase = main
enableRealtimeSearch = true
frozenTimePeriodInSecs = 188697600
homePath = $SPLUNK_DBportaldb
homePath.maxDataSizeMB = 51200
indexThreads = auto
maxConcurrentOptimizes = 3
maxDataSize = auto
maxHotBuckets = 3
maxHotIdleSecs = 0
maxHotSpanSecs = 7776000
maxMemMB = 5
maxMetaEntries = 1000000
maxTotalDataSizeMB = 500000
maxWarmDBCount = 150
memPoolMB = auto
minRawFileSyncSecs = disable
partialServiceMetaPeriod = 0
quarantineFutureSecs = 2592000
quarantinePastSecs = 77760000
rawChunkSizeBytes = 131072
rotatePeriodInSecs = 60
serviceMetaPeriod = 25
suppressBannerList =
sync = 0
syncMeta = true
thawedPath = D:splunkvarlibsplunkportalthaweddb
throttleCheckPeriod = 15
This is been identified as a know issue in 4.2.1 and 4.2.2 (SPL-40220)
http://www.splunk.com/base/Documentation/4.2.1/ReleaseNotes/Knownissues
This is been identified as a know issue in 4.2.1 and 4.2.2 (SPL-40220)
http://www.splunk.com/base/Documentation/4.2.1/ReleaseNotes/Knownissues
done!
the logs show something about a "fullRebuild" but I can't find anything it in the documentation 😞
Could you update your post with the output of "splunk cmd btool indexes list my_index_stanza"?