Re: Corrupted bucket journal?

clindseyssi · ‎03-25-2013

Hi Everyone! I hope this isn't a "frequently solved problem." I've searched and googled for answers but I ran into a wall.

First, I started getting this error in Splunk web:

[EventsViewer module] Error in 'databasePartitionPolicy': Failed to read 1 event(s) from rawdata in bucket 'main~35~073974E4-ED0F-432A-8DF5-3AB3DE83D4ED'. Rawdata may be corrupt, see search.log

Hmmmm. So I googled and found in the answers forum a link that told me how to run fsck against the bucket. And I did. Here is the result:

$ sudo /Applications/splunk/bin/splunk stop
$ sudo /Applications/splunk/bin/splunk fsck --all
bucket=/Applications/splunk/var/lib/splunk/audit/db/db_1360792166_1360340101_24 NEEDS REPAIR: count mismatch tsidx=0 slices.dat=6088
bucket=/Applications/splunk/var/lib/splunk/defaultdb/db/db_1360792158_1359732196_28 NEEDS REPAIR: count mismatch tsidx=36837 slices.dat=38544

SUMMARY: We have detected 2 buckets (877515 bytes of compressed rawdata) need rebuilding.
    Depending on the speed of your server, this may take from 0 to 1 minutes.  You can use the --repair option to fix

So I added the --repair switch. And this is that result:

$ sudo /Applications/splunk/bin/splunk fsck --all --repair
bucket=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40 count mismatch tsidx=524223 source-metadata=524228, repairing...
    bucket=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40 rebuild failed: caught exception while rebuilding: Error reading compressed journal while streaming: bad gzip header, provider=/Applications/splunk/var/lib/splunk/_internaldb/db/db_1364229909_1363960207_40/rawdata/journal.gz

I searched the forum and google for the next steps but didn't find anything useful. Has anyone else seen something like this? Were you able to resolve it?

Any help, as always, is appreciated.

Sevjer13 · ‎03-27-2019

Splunk disable and enable seems not to work on clusters. Seems only way to do it is to "move" the index. I do not like this as it means we are losing unknown data. Possible exploit here?

anwarmian · ‎04-18-2020

In a cluster environment, you should already have a copy of the searchable bucket in another indexer provided you have at least SF=2.

Enable the indexer cluster maintenance mode
Stop the indexer in question 3. a. Move the broken journal file away (while splunkd turned off) to another place or b. Delete the bucket
Start the indexer in question
Disable the indexer cluster maintenance mode.

Sevjer13 · ‎03-27-2019

Note this does not work on Clusters, only fix I found was to stop splunk and move the file away. I do not like that as it means your losing data.

lmyrefelt · ‎02-11-2014

Hi I would just like to confirm that MikaelSandquist solution Works 🙂

This is what you would like to do;
1. download the search.log (via jobb-inspector) from the node that fails / that have the corrupted jornal / rawdata.
2. locate the bucket that is corrupt
3. stop splunk on that node
4. run splunk cmd splunkd fsck --all --repair
5. run splunk cmd splunkd rebuild /path/to/Your/failed/db/bucket (found in search.log)
6. List item
7. splunk disable index "nameOfIndex"
9. splunk enable index "nameOfIndex"

In my case both the rebuild and repair failed to correct the issue however disabled and enable the index seems to have solved the issue.

Seems splunk is re-creating the jornal file ? or just roll It ?

Hope this will help 🙂

mikaelsandquist · ‎05-15-2013

I solved it by disable the index that had a damaged journal file from cli:

/opt/splunk/bin/splunk disable index name_of_your_index

I started splunk up and enabled the index from the web gui and restarted splunk to see if it starts ok without errors. Looks like splunk removed the broken journal file during that process.

Another suggestion that I got from Splunk Support was to just move the broken journal file away (while splunkd turned off) to another place and then start splunk.

lukejadamec · ‎01-29-2014

The -repair command runs behind the scenes automatically. It does not 'repair everything at startup'. It does so gradually over time. You can run the repair routine manually, but that never seems to work for me. I prefer to rebuild 'bad' buckets. Also, if the journal is truly corrupt, then it cannot be repaired. Splunk cannot manipulate the journal data. See the troubleshooting section at the bottom of this doc: http://docs.splunk.com/Documentation/Splunk/6.0.1/Indexer/HowSplunkstoresindexes

campbellj1977 · ‎01-29-2014

Is FSCK supposed to run automatically when splunk is restarted? I am guessing that the restart alone did not work for you?

I am having the same problem, but the service restart did not run the fsck --repair

mikaelsandquist · ‎05-15-2013

I have encountered the same problem today.

clindseyssi · ‎04-23-2013

Hi I thought I'd give this a bump and see if anyone had any thoughts on this..

Thanks!

Craig

Corrupted bucket journal?

indexer

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!