abc@abc /opt/hunk/bin $ sudo ./splunk start
Splunk> Finding your faults, just like mom.
Checking prerequisites...
Checking http port [8000]: open
Checking mgmt port [8089]: open
Checking appserver port [127.0.0.1:8065]: open
Checking kvstore port [8191]: open
Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _introspection _thefishbucket history main summary
Done
Checking filesystem compatibility... Done
Checking conf files for problems...
Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
Waiting for web server at http://127.0.0.1:8000 to be availablesplunkd 5754 was not running.
Stopping splunk helpers...
Done.
Stopped helpers.
Removing stale pid file... done.
WARNING: web interface does not seem to be available!
Following is the crash report generated in log directory of Splunk's.
[build 237464] 2015-01-07 18:31:12
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 5371 running under UID 0.
Crashing thread: IndexerTPoolWorker-1
Registers:
RIP: [0x00007FE2AE2ACBB9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
RDI: [0x00000000000014FB]
RSI: [0x000000000000150D]
RBP: [0x00007FE2A77FE9B0]
RSP: [0x00007FE2A77FE7C8]
RAX: [0x0000000000000000]
RBX: [0x00007FE2A683F1C0]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007FE2AE6379D0]
R9: [0x0000000001612EEA]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00007FE2A683F1E0]
R13: [0x00007FE2A683C0C0]
R14: [0x00007FE2A683C0F8]
R15: [0x00007FE2A683C0F0]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
[0x00007FE2AE2ACBB9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
[0x00007FE2AE2AFFC8] abort + 328 (/lib/x86_64-linux-gnu/libc.so.6)
[0x00000000015BA6C5] _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 245 (splunkd)
[0x0000000001574BB6] _ZN10__cxxabiv111__terminateEPFvvE + 6 (splunkd)
[0x0000000001574BE3] ? (splunkd)
[0x0000000001575F4E] ? (splunkd)
[0x0000000000A29789] _ZN24DatabaseDirectoryManager20locked_scanDirectoryERKSt3mapI10CMBucketIdNS_6BucketESt4lessIS1_ESaISt4pairIKS1_S2_EEERK8Pathnameb + 1865 (splunkd)
[0x0000000000A2989D] _ZN24DatabaseDirectoryManager22locked_scanDirectoriesEv + 77 (splunkd)
[0x0000000000A2BE70] _ZN24DatabaseDirectoryManager29refreshBucketManifest_startupEv + 48 (splunkd)
[0x0000000000A2C098] _ZN24DatabaseDirectoryManagerC1ERK8PathnameS2_S2_S2_bRK3Str + 392 (splunkd)
[0x00000000009FF77B] _ZN23DatabasePartitionPolicy48openDatabases_ensureInitialized_directoryManagerEv + 107 (splunkd)
[0x0000000000A0DA78] _ZN23DatabasePartitionPolicy13openDatabasesEbb + 56 (splunkd)
[0x0000000000A0DDE5] _ZN23DatabasePartitionPolicy5startEbb + 181 (splunkd)
[0x0000000000D3F899] _ZN6Worker4mainEv + 57 (splunkd)
[0x0000000000F4FA7E] _ZN6Thread8callMainEPv + 62 (splunkd)
[0x00007FE2AE644182] ? (/lib/x86_64-linux-gnu/libpthread.so.0)
[0x00007FE2AE370EFD] clone + 109 (/lib/x86_64-linux-gnu/libc.so.6)
Linux / abacus-ThinkPad-W540 / 3.13.0-24-generic / #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2014-12-18 12:22:10.227 +0530 Interrupt signal received
2015-01-07 18:30:50.058 +0530 splunkd started (build 237464)
terminate called after throwing an instance of 'DatabaseDirectoryManagerException'
what(): idx=_audit bucket=db_1418731414_1418730253_17 Detected directory manually copied into its database, causing id conflicts [path1='/opt/hunk/var/lib/splunk/audit/db/hot_v1_17' path2='/opt/hunk/var/lib/splunk/audit/db/db_1418731414_1418730253_17'].terminate called recursively
terminate called recursively
terminate called recursively
2015-01-07 18:31:12.523 +0530 splunkd started (build 237464)
terminate called after throwing an instance of 'DatabaseDirectoryManagerException'
terminate called recursively
terminate called recursively
/etc/debian_version: jessie/sid
Last errno: 0
Threads running: 17
argv: [splunkd -p 8089 start]
Thread: "IndexerTPoolWorker-1", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7fe2a7830790:
00000000 00 f7 7f a7 e2 7f 00 00 |........|
00000008
TPool Worker: _shouldJoinAndDelete=N, _id=1
Running TJob: name=TJob
x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000306C3 05100800 7FDAFBBF BFEBFBFF
2: 76036301 00F0B5FF 00000000 00C10000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000040 00000040 00000003 00042120
6: 00000077 00000002 00000009 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300403 00000000 00000000 00000603
B: 00000000 00000000 000000FF 00000005
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000021 2C100800
80000002: 65746E49 2952286C 726F4320 4D542865
80000003: 37692029 3037342D 20514D30 20555043
80000004: 2E322040 48473034 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003027 00000000 00000000 00000000
terminating...
How do I repair this errors?
Yes I copied whole Hunk directory from some other physical location.
I tried rebuilding the indexes
splunk rebuild <bucket directory>
but no luck thrown with following output.
USAGE: splunk rebuild <bucketPath> [<indexName>] [--no-log]
The <indexName> parameter is ignored if provided.
Please see 'splunk fsck' for more options. This command is just a wrapper for 'splunk fsck'.
Redirecting to 'splunkd fsck' with args:
repair --one-bucket --include-hots --bucket-path=../var/lib/splunk/audit/db/hot_v1_19/ --log-to--splunkd-log
No bootstrap configuration available for: /etc
WARN Fsck - Not loading indexes.conf; will proceed with all defaults
WARN ServerConfig - No value found for listenOnIPv6 setting. Using the default value of "no"
WARN ServerConfig - No value found for connectUsingIpVersion setting. Using the default value of "auto"
ERROR ServerConfig - Found no server name in server.conf. Please set it. Will attempt to use default for now.
WARN ServerConfig - No web configuration present, assuming defaults.
WARN ServerConfig - No SSL configuration present, assuming SSL using defaults.
ERROR IndexConfig - stanza=default Required parameter=blockSignatureDatabase not configured
WARN BucketBuilder - Could not read indexes.conf, using bucketRebuildMemoryHint=33554432 (MB=32.000000)
INFO BucketBuilder - Could not parse server/[diskUsage]/minFreeSpace, defaulting to 2048
INFO Fsck - (entire bucket) Rebuild for bucket='/opt/hunk/var/lib/splunk/audit/db/hot_v1_19' took 135.8 milliseconds
It seems like somehow the audit index was corrupted. If you don't care about its content one quick solution would be to move the entire directory away - ie mv /opt/hunk/var/lib/splunk/audit / tmp/ and restart. You might have to do this for other indexes if they're corrupt too ... Since Hunk doesn't store any data locally on these indexes you should be OK moving them out, but if you do care about their content please let us know and we can dig a bit deeper
I agree Hunk does not store data locally but in my case I have some events in "main" index. Any other way to repair this corrupt index?
In the failure you showed it was only the audit index (ie index=_audit) that had a problem - are you saying that the main index (defaultdb) also has the same issues?
Moving audit index alone did't help but moving _interospection as well _internaldb, made it working. But I think this is not a optimal way to fix, we must find a way to recover this corrupt index for inspection purpose. In some situations we can't afford to loose _internal db.