The last few days, I got a lot of ERROR messages.
12-30-2015 09:26:37.537 +0100 ERROR ProcessTracker - (child_4707__Fsck) Fsck - idx=_internal bkt='[***]indizes/_internaldbcolddb/db_1450751023_1450751018_1407' Failed to write: (but will ignore per SPL-52537 hack) bloomfilter || size manifest || .finalized
12-30-2015 09:26:37.537 +0100 ERROR ProcessTracker - (child_4707__Fsck) BucketBuilder - process=recover-metadata failed with exit_code=214 (exited with code 214)
12-30-2015 09:26:29.663 +0100 ERROR ProcessTracker - (child_4706__Fsck) Fsck - idx=_internal bkt='[***]indizes/_internaldb/colddb/db_1450751025_1450750016_1183' Failed to write: (but will ignore per SPL-52537 hack) bloomfilter || size manifest || .finalized
12-30-2015 09:26:29.663 +0100 ERROR ProcessTracker - (child_4706__Fsck) BucketBuilder - process=recover-metadata failed with exit_code=214 (exited with code 214)
12-30-2015 09:26:24.514 +0100 ERROR ProcessTracker - (child_4705__Fsck) Fsck - idx=_internal bkt='[***]indizes/_internaldb/colddb/db_1450751068_1450750585_1186' Failed to write: (but will ignore per SPL-52537 hack) bloomfilter || size manifest || .finalized
12-30-2015 09:26:24.514 +0100 ERROR ProcessTracker - (child_4705__Fsck) BucketBuilder - process=recover-metadata failed with exit_code=214 (exited with code 214)
12-30-2015 09:26:21.364 +0100 ERROR ProcessTracker - (child_4704__Fsck) Fsck - idx=_internal bkt='[***]indizes/_internaldb/colddb/db_1450751355_1450751015_1187' Failed to write: (but will ignore per SPL-52537 hack) bloomfilter || size manifest || .finalized
12-30-2015 09:26:21.364 +0100 ERROR ProcessTracker - (child_4704__Fsck) BucketBuilder - process=recover-metadata failed with exit_code=214 (exited with code 214)
I checked permissions on those directories - they are ok.
Does anybody know, where exactly the problem is?
I found some new and strange thing. When I use df -h . on my filesystem:
Filesystem Size Used Avail Use% Mounted on
indizes 500G 164G 337G 33% [...]/indizes
Using REST-API | rest splunk_server=[...] /services/server/status/partitions-space | table available, capacity, fs_type, updated
It's really strange. It should be 345000 in the available cell...
did you just use xfs_growfs but not actually increase the disk capacity?
Sorry for asking, but is there any difference? I'm not responsible for the storage, I'm just the admin for Splunk. What have to be done to make it work again, so I can contact my colleague for correction.
Thanks!
Typically in LVMs there are two commands you run to increase the size of the volume.
Sometimes the storage crew forgets to grow the volume after they resize it. You end up with a situation where df command shows the disk has more capacity but the usable space is the same it was before the expansion.
There is a Splunk bug tracking number in the error message (ie: "but will ignore per SPL-52537 hack")
I would guess its some kind of known issue - I know it can be slow but have you raised it with Splunk Support? They might be able to shed more light on it.
Also are you using some wacky file system like BTRFS ?
Any ideas?
So the user account running splunkd has write access to the directory?
How about disk space? Do you have enough free disk space?
Yes, the user running splunkd has write access to the directory.
Disk space is also available.
[indizes]$ df -h .
Filesystem Size Used Avail Use% Mounted on
indizes 500G 152G 349G 31% [...]/indizes
But we had a short problem, where the filesystem was only 150G and was used 100%. But now, we resized the volume and restarted splunk.