Hi Everyone! I hope this isn't a "frequently solved problem." I've searched and googled for answers but I ran into a wall.
First, I started getting this error in Splunk web:
[EventsViewer module] Error in 'databasePartitionPolicy': Failed to read 1 event(s) from rawdata in bucket 'main~35~073974E4-ED0F-432A-8DF5-3AB3DE83D4ED'. Rawdata may be corrupt, see search.log
Hmmmm. So I googled and found in the answers forum a link that told me how to run fsck against the bucket. And I did. Here is the result:
$ sudo /Applications/splunk/bin/splunk stop
$ sudo /Applications/splunk/bin/splunk fsck --all
bucket=/Applications/splunk/var/lib/splunk/audit/db/db_1360792166_1360340101_24 NEEDS REPAIR: count mismatch tsidx=0 slices.dat=6088
bucket=/Applications/splunk/var/lib/splunk/defaultdb/db/db_1360792158_1359732196_28 NEEDS REPAIR: count mismatch tsidx=36837 slices.dat=38544
SUMMARY: We have detected 2 buckets (877515 bytes of compressed rawdata) need rebuilding.
Depending on the speed of your server, this may take from 0 to 1 minutes. You can use the --repair option to fix
So I added the --repair switch. And this is that result:
$ sudo /Applications/splunk/bin/splunk fsck --all --repair
bucket=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40 count mismatch tsidx=524223 source-metadata=524228, repairing...
bucket=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40 rebuild failed: caught exception while rebuilding: Error reading compressed journal while streaming: bad gzip header, provider=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40/rawdata/journal.gz
I searched the forum and google for the next steps but didn't find anything useful. Has anyone else seen something like this? Were you able to resolve it?
Any help, as always, is appreciated.
Splunk disable and enable seems not to work on clusters. Seems only way to do it is to "move" the index. I do not like this as it means we are losing unknown data. Possible exploit here?
In a cluster environment, you should already have a copy of the searchable bucket in another indexer provided you have at least SF=2.
Note this does not work on Clusters, only fix I found was to stop splunk and move the file away. I do not like that as it means your losing data.
Hi I would just like to confirm that MikaelSandquist solution Works 🙂
This is what you would like to do;
1. download the search.log (via jobb-inspector) from the node that fails / that have the corrupted jornal / rawdata.
2. locate the bucket that is corrupt
3. stop splunk on that node
4. run splunk cmd splunkd fsck --all --repair
5. run splunk cmd splunkd rebuild /path/to/Your/failed/db/bucket (found in search.log)
6. List item
7. splunk disable index "nameOfIndex"
9. splunk enable index "nameOfIndex"
In my case both the rebuild and repair failed to correct the issue however disabled and enable the index seems to have solved the issue.
Seems splunk is re-creating the jornal file ? or just roll It ?
Hope this will help 🙂
I solved it by disable the index that had a damaged journal file from cli:
/opt/splunk/bin/splunk disable index name_of_your_index
I started splunk up and enabled the index from the web gui and restarted splunk to see if it starts ok without errors. Looks like splunk removed the broken journal file during that process.
Another suggestion that I got from Splunk Support was to just move the broken journal file away (while splunkd turned off) to another place and then start splunk.
The -repair command runs behind the scenes automatically. It does not 'repair everything at startup'. It does so gradually over time. You can run the repair routine manually, but that never seems to work for me. I prefer to rebuild 'bad' buckets. Also, if the journal is truly corrupt, then it cannot be repaired. Splunk cannot manipulate the journal data. See the troubleshooting section at the bottom of this doc: http://docs.splunk.com/Documentation/Splunk/6.0.1/Indexer/HowSplunkstoresindexes
Is FSCK supposed to run automatically when splunk is restarted? I am guessing that the restart alone did not work for you?
I am having the same problem, but the service restart did not run the fsck --repair
I have encountered the same problem today.
Hi I thought I'd give this a bump and see if anyone had any thoughts on this..
Thanks!
Craig