How to repair database?
After unexpected shutdowns of server splunkd keeps crashing.
When I backed up /var/lib/splunk/ dir and did fresh install, it works. After copying /var/lib/splunk from backup, it keeps crashing: ERROR WordPositionData - couldn't parse hash code:
Seems like data files are broken. How to repair data files?
(spam removed)
The presence of "ERROR WordPositionData - couldn't parse hash code:" messages in splunkd.log often indicates an inconsistency in one of the metadata files (Hosts.data, Sources.data, SourceTypes.data) located in the hot/warm index repository (Example for the main index : $SPLUNK_DB/defaultdb/db/) or in one of the buckets (usually one of the hot ones) contained in that index.
To fix this, the first thing to do is to identify which metadata file(s) has/have inconsistencies.
To that effect, the following command has to be run for the incriminated index (check splunkd.log, it's the index that was just being opened before splunkd crashed) and for all of it's hot/warm buckets :
$SPLUNK_HOME/bin/recover-metadata {path_to_index|path_to_bucket} --validate
For a given index, I like to run the two commands below to check the metadata files at the root of the hot/warm db first, and then each bucket using the list from .bucketManifest :
$SPLUNK_HOME/bin/recover-metadata $SPLUNK_DB/{index_name}/db/ --validate
for i in 'cat $SPLUNK_DB/{index_name}/db/.bucketManifest | cut -f3 -d " "'; do $SPLUNK_HOME/bin/recover-metadata $SPLUNK_DB/{index_name}/db/$i ; done
Each time an error is reported, the corresponding .data file should be deleted. Once all corrupted metadata files have been removed, the check should be run again. It will indicate errors for those files because they can't be found, but Splunk should be now ready to start.
Repeat the operation for each index for which splunkd.log reports this type of error.
I have the same error message. Splunk server bombed and when I rebooted I get crash logs in the var-log-splunk folder.
[build 79191] C++ exception: object@[0x02C1EB34], type@[0x00D0F58C] Exception is Non-continuable Exception address: [0x75B1FBAE] Crashing thread: indexerPipe ContextFlags: [0x0001003F] Dr0: [0x00000000] Dr1: [0x00000000] Dr2: [0x00000000] Dr3: [0x00000000] Dr6: [0x00000000] Dr7: [0x00000000] SegGs: [0x00000000] SegFs: [0x0000003B] SegEs: [0x00000023] SegDs: [0x00000023] Edi: [0x02C1EC98] Esi: [0x02C1EBBC] Ebx: [0xFFFFFFFF] Edx: [0x00000000] Ecx: [0x00000003] Eax: [0x02C1EA80] Ebp: [0x02C1EAD0] Eip: [0x75B1FBAE] RaiseException + 88/97 SegCs: [0x0000001B] EFlags: [0x00000212] Esp: [0x02C1EA80] SegSs: [0x00000023]
OS: Windows Arch: i386
Backtrace: Frame 0 @[0x02C1EAD0]: [0x72DE8E89] CxxThrowException + 70/77 Frame 1 @[0x02C1EB08]: [0x006C64AE] ? Frame 2 @[0x02C1EDCC]: [0x02C1F378] ? Frame 3 @[0x00B70610]: (Frame below stack)
Crash dump written to: C:\Program Files\Splunk\var\log\splunk\C__Program Files_Splunk_bin_splunkd_exe_crash-2010-05-12-15-25-35.dmp
SPLUNK /6.0 Service Pack 2 C++ Exception type: WordPositionData::Exception -> std::exception what(): couldn't parse hash code: Threads running: 9 terminating...
This sounds like we borked support for a really old bucket format. I'm not in the active loop on such problems these days, but it would be useful to know if your index contains data from extremely old splunk versions (eg 2.x).
Your indexes are automatically repaired every time splunk is started and that normally takes care of most crash issues. If you have been able to roll back to 4.0.x and splunkd starts up, then It doesn't seem like your issue is with your indexes.
In and index corruption scenario (one that isn't not automatically recovered, of course), I would suggest that you open a support case with splunk support, run the splunk diag
utility, then attach the generated diag file to your support case and immediately follow up with splunk's support via phone.
If splunkd
literally crashed, then another approach to digging into what happened is to look in the var\log\splunk
folder looking for *crash*
log files. They may be able to shed some light on your issue. Of course, the splunkd.log
and other logs in that folder, can be very valuable as well.
I was given the same error messages after upgrading from 4.0.9 to 4.1. Splunkd crashes seconds after being started.
The splunkd.log mentioned that there were problems trying to move an "old style hot db", including an "Access Denied" for the db files. This seemed strange to me, as splunkd runs as localsystem (windows)
My solution was to downgrade to 4.0.10. Now Splunk starts, but I get another error message in the status bar within splunk;
"Misconfigured view 'splunkd_status' - Unknown parameter 'drilldown' is defined for module SimpleResultsTable. Make sure the parameter is specified in SimpleResultsTable.conf."
The drilldown
feature wasn't added until 4.1. You may have this issues with any views (dashboards) that you changed in 4.1 before you went back to 4.0.x. You should be able to comment out (or remove) drilldown
option tags.