We're currently running Splunk Enterprise 6.1.2.
A few times in the past few months, we've run into a problem where the data we've had in the index has disappeared. It's frustrating and need to prevent it from happening again. The first two times was on our server indexing logs from our test environments. The third time it happened on our server indexing the logs from production environments.
First time we experienced this issue, we had over a year's worth of data in the index and the next day we couldn't search back before the weekend. The index was no where near its max size and the size of the index after the data went missing was less than it was previously.
The second time happened a couple of months later. We went from that couple of months of data down to about two weeks of data.
Today's occurrence with the missing production data has made this an immediate concern. We had a Engineer resuming his research of events from Oct 27th and the data he was looking at yesterday is missing today. The index was at near capacity when I checked after the Engineer reported the issue of missing data to me. However, the index has events going back all the way to Oct 6th.
For this particular index, we have a suite of applications from many different hosts forwarding their data. A number of these hosts and applications we can pull up their indexed data going back to Oct 6th, but for this application one application that had customers reporting issues... we cannot pull up the data prior to late in the day Oct 27th.
Why/How is this happening?
I've tried searching for an answer to this issue and cannot find any. I've perused the _audit index for anything that looks like it would explain the missing data, but do not find anything. What else can I look for to explain why this happened and hopefully prevent it from happening or controlling when it happens?
Thanks in advance for any of your suggestions/answers to my plea for help!
Update:
The following was pulled from the splunkd.log file and do not know if they does or does not account for the issue I reported:
11-06-2014 10:24:01.992 -0600 WARN BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_814'
11-06-2014 10:24:01.992 -0600 WARN BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_815'
11-06-2014 10:24:01.992 -0600 WARN BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_816'
11-06-2014 10:24:01.992 -0600 INFO DbMaxSizeManager - Moving up to 8 hot+warm buckets, start from oldest by LT, until achieve compliance (size: current=2097258496 (2000MB,1GB) max=2097152000 (2000MB,1GB))
11-06-2014 10:24:01.997 -0600 INFO DbMaxSizeManager - Will chill bucket=/opt/splunk/var/lib/splunk/bc/db/db_1415124594_1415087509_806 LT=1415124594 size=399175680 (380MB)
11-06-2014 10:24:02.015 -0600 INFO BucketMover - idx=bc Moving bucket='db_1415124594_1415087509_806' because maximum number of warm databases exceeded, starting warm_to_cold: from='/opt/splunk/var/lib/splunk/bc/db' to='/storage/splunk/bc/colddb'
11-06-2014 10:24:02.015 -0600 INFO BucketMover - idx=bc bucket=db_1415124594_1415087509_806 Firing async chiller: from='/opt/splunk/var/lib/splunk/bc/db' to='/storage/splunk/bc/colddb'
11-06-2014 10:24:02.015 -0600 INFO DbMaxSizeManager - Bucket moved successfully (size: cur=1698082816 (1619MB,1GB), max=2097152000 (2000MB,1GB))
11-06-2014 10:24:13.689 -0600 INFO DatabaseDirectoryManager - Writing a bucket manifest in hotWarmPath='/opt/splunk/var/lib/splunk/bc/db'. Reason='Updating bucket, bid=bc~806~5C35B09D-9D10-405D-B658-C20C93219352'
... View more