Hi all,
Splunk is crashing when I tried to start the service. Here's the crash report.
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 3263 running under UID 31204.
Crashing thread: SplunkdSpecificInitThread
Registers:
RIP: [0x00007F3194A775F7] gsignal + 55 (/lib64/libc.so.6 + 0x355F7)
RDI: [0x0000000000000CBF]
RSI: [0x0000000000000CCF]
RBP: [0x00007F3194BC0288]
RSP: [0x00007F318E5FE458]
RAX: [0x0000000000000000]
RBX: [0x00007F3194A41000]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007F3189E00000]
R9: [0x00007F318FFD3880]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00007F3197B82570]
R13: [0x00007F3197C36D60]
R14: [0x00007F318DE4A460]
R15: [0x00007F318E5FE950]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0xFFFF000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace (PIC build):
[0x00007F3194A775F7] gsignal + 55 (/lib64/libc.so.6 + 0x355F7)
[0x00007F3194A78CE8] abort + 328 (/lib64/libc.so.6 + 0x36CE8)
[0x00007F3194A70566] ? (/lib64/libc.so.6 + 0x2E566)
[0x00007F3194A70612] ? (/lib64/libc.so.6 + 0x2E612)
[0x00007F3196A066CD] _ZN14IndexerService35disableIndexesAndReinitGlobalConfigERKN9__gnu_cxx17__normal_iteratorIPK3StrSt6vectorIS2_SaIS2_EEEESA_ + 1741 (splunkd + 0x9B76CD)
[0x00007F3196A076E7] _ZN14IndexerService18initPerIndexConfigEP9StrVectorb + 455 (splunkd + 0x9B86E7)
[0x00007F3196A09CB1] _ZN14IndexerService12reloadConfigERK14IndexConfigRef + 481 (splunkd + 0x9BACB1)
[0x00007F3196FE4050] _ZN9EventLoop20internal_runInThreadEP13InThreadActorb + 256 (splunkd + 0xF95050)
[0x00007F3196A05BA8] _ZN14IndexerService16loadLatestConfigEP14IndexConfigRef + 808 (splunkd + 0x9B6BA8)
[0x00007F3196A05D1B] _ZN14IndexerService16loadLatestConfigEv + 43 (splunkd + 0x9B6D1B)
[0x00007F3196A0A3AB] _ZN14IndexerServiceC2Ev + 859 (splunkd + 0x9BB3AB)
[0x00007F3196A0A847] _ZN14IndexerService14_new_singletonEv + 55 (splunkd + 0x9BB847)
[0x00007F31966AD84F] _ZN25SplunkdSpecificInitThread4mainEv + 159 (splunkd + 0x65E84F)
[0x00007F31970A1490] _ZN6Thread8callMainEPv + 64 (splunkd + 0x1052490)
[0x00007F3194E0ADC5] ? (/lib64/libpthread.so.0 + 0x7DC5)
[0x00007F3194B3828D] clone + 109 (/lib64/libc.so.6 + 0xF628D)
Linux / pcpnplsplidx01 / 3.10.0-327.el7.x86_64 / #1 SMP Thu Oct 29 17:29:29 EDT 2015 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2016-05-21 21:37:57.820 -0500 splunkd started (build f2c836328108)
splunkd: /home/build/build-src/galaxy/src/pipeline/indexer/IndexerService.cpp:921: void IndexerService::disableIndexesAndReinitGlobalConfig(const const_iterator&, const const_iterator&): Assertion `0 && "Cannot disable indexes on a clustering slave."' failed.
2016-05-21 21:42:25.272 -0500 splunkd started (build f2c836328108)
splunkd: /home/build/build-src/galaxy/src/pipeline/indexer/IndexerService.cpp:921: void IndexerService::disableIndexesAndReinitGlobalConfig(const const_iterator&, const const_iterator&): Assertion `0 && "Cannot disable indexes on a clustering slave."' failed.
/etc/redhat-release: Red Hat Enterprise Linux Server release 7.2 (Maipo)
glibc version: 2.17
glibc release: stable
Last errno: 2
Threads running: 23
Runtime: 2.965932s
argv: [splunkd -p 8089 start]
Thread: "SplunkdSpecificInitThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f3190276410:
00000000 00 f7 5f 8e 31 7f 00 00 |.._.1...|
00000008
InThreadActor @0x7f318e5feaa0: _queuedOn=(nil), ran=N, wantWake=Y, wantFailIfLoopDone=N
First 128 bytes of InThreadActor object @0x7f318e5feaa0:
00000000 f8 78 17 98 31 7f 00 00 01 00 00 8e 31 7f 00 00 |.x..1.......1...|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 a0 e4 8d 31 7f 00 00 00 f0 e4 8d 31 7f 00 00 |....1.......1...|
00000060 e0 eb 5f 8e 31 7f 00 00 95 9a e4 72 7f 83 d3 1c |.._.1......r....|
00000070 50 9b 04 96 31 7f 00 00 50 eb 5f 8e 31 7f 00 00 |P...1...P._.1...|
00000080
x86 CPUID registers:
0: 0000000F 756E6547 6C65746E 49656E69
1: 000306F2 03020800 9ED83203 1FABFBFF
2: 76036301 00F0B5FF 00000000 00C10000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000075 00000002 00000009 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000000 000000CD 00000003
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
E: 00000000 00000000 00000000 00000000
F: 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 28100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 30333632 20337620
80000004: 2E322040 48473034 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003028 00000000 00000000 00000000
terminating...
It's funny getting the notification for this today. I actually just ran into the same crash myself recently. I have a support case of 440220, which resulted in enhancement request of ENH-6091. If you have a support account and want to be notified of this you can log a case to be added to the CC list of these.
But in $SPLUNK_HOME/var/log/splunk/splunkd.log
(or one of the rolled copies if it's been a while, timestamp just before my crash I saw messages like this):
01-10-2017 17:30:01.936 -0600 ERROR DatabaseDirectoryManager - idx=idxname bucket=db_1484082640_1483977006_1_{guid} Detected directory manually copied into its database, causing id conflicts [path1='{idx:homePath}/db_1484082715_1483977061_1_{guid}' path2='/{idx:homePath}/db_1484082640_1483977006_1_{guid}'].
01-10-2017 17:30:01.936 -0600 ERROR IndexerService - Error intializing IndexerService: idx=idxname bucket=db_1484082640_1483977006_1_{guid} Detected directory manually copied into its database, causing id conflicts [path1='/{idx:homePath}/db_1484082715_1483977061_1_{guid}' path2='/{idx:homePath}/db_1484082640_1483977006_1_{guid}'].
After fixing the conflicting buckets, (I had to do a couple rounds, as it only reported a single pair of buckets each crash), but I was able to start successfully myself as @kiran331 mentioned
I did finally manage to find the offending bucket(s). After removing them that were manually copied in, startup works now and we're back up and running. Thank you!
If you have a support contract I would definitely log a case for this. I should be able to disable indexes across an entire cluster without a crash. (Disabling on an individual slave should not happen, but the ideal case would be not to crash when detecting this state, but failing more gracefully.
@kiran331 what is the solution for this issue
In the Crash.log I saw the replicated Bucket is causing errors, I removed the bucket and splunk service is started.
I have the same problem here on one of my indexers, but I do not see a bucket name or ID. Where does the crash log show the bucket?
I did the same but it is not coming up, i just don't know what else might be the problem.
Ok. Better to file a case with Support.
It appears you have a config in place that attempts to disable indexes on a clustered slave.
I would check what changes have taken place in your configs between the last restart of Splunk and this most recent one. A review those changes will probably point out where it's being disabled from.