Re: Why is Splunk crashing whenever I try to start...

kiran331 · ‎05-22-2016

Hi all,

Splunk is crashing when I tried to start the service. Here's the crash report.

Received fatal signal 6 (Aborted).
Cause:
   Signal sent by PID 3263 running under UID 31204.
Crashing thread: SplunkdSpecificInitThread
Registers:
    RIP:  [0x00007F3194A775F7] gsignal + 55 (/lib64/libc.so.6 + 0x355F7)
    RDI:  [0x0000000000000CBF]
    RSI:  [0x0000000000000CCF]
    RBP:  [0x00007F3194BC0288]
    RSP:  [0x00007F318E5FE458]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007F3194A41000]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0x00007F3189E00000]
    R9:  [0x00007F318FFD3880]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000202]
    R12:  [0x00007F3197B82570]
    R13:  [0x00007F3197C36D60]
    R14:  [0x00007F318DE4A460]
    R15:  [0x00007F318E5FE950]
    EFL:  [0x0000000000000202]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0xFFFF000000000033]
    OLDMASK:  [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace (PIC build):
  [0x00007F3194A775F7] gsignal + 55 (/lib64/libc.so.6 + 0x355F7)
  [0x00007F3194A78CE8] abort + 328 (/lib64/libc.so.6 + 0x36CE8)
  [0x00007F3194A70566] ? (/lib64/libc.so.6 + 0x2E566)
  [0x00007F3194A70612] ? (/lib64/libc.so.6 + 0x2E612)
  [0x00007F3196A066CD] _ZN14IndexerService35disableIndexesAndReinitGlobalConfigERKN9__gnu_cxx17__normal_iteratorIPK3StrSt6vectorIS2_SaIS2_EEEESA_ + 1741 (splunkd + 0x9B76CD)
  [0x00007F3196A076E7] _ZN14IndexerService18initPerIndexConfigEP9StrVectorb + 455 (splunkd + 0x9B86E7)
  [0x00007F3196A09CB1] _ZN14IndexerService12reloadConfigERK14IndexConfigRef + 481 (splunkd + 0x9BACB1)
  [0x00007F3196FE4050] _ZN9EventLoop20internal_runInThreadEP13InThreadActorb + 256 (splunkd + 0xF95050)
  [0x00007F3196A05BA8] _ZN14IndexerService16loadLatestConfigEP14IndexConfigRef + 808 (splunkd + 0x9B6BA8)
  [0x00007F3196A05D1B] _ZN14IndexerService16loadLatestConfigEv + 43 (splunkd + 0x9B6D1B)
  [0x00007F3196A0A3AB] _ZN14IndexerServiceC2Ev + 859 (splunkd + 0x9BB3AB)
  [0x00007F3196A0A847] _ZN14IndexerService14_new_singletonEv + 55 (splunkd + 0x9BB847)
  [0x00007F31966AD84F] _ZN25SplunkdSpecificInitThread4mainEv + 159 (splunkd + 0x65E84F)
  [0x00007F31970A1490] _ZN6Thread8callMainEPv + 64 (splunkd + 0x1052490)
  [0x00007F3194E0ADC5] ? (/lib64/libpthread.so.0 + 0x7DC5)
  [0x00007F3194B3828D] clone + 109 (/lib64/libc.so.6 + 0xF628D)
Linux / pcpnplsplidx01 / 3.10.0-327.el7.x86_64 / #1 SMP Thu Oct 29 17:29:29 EDT 2015 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
    2016-05-21 21:37:57.820 -0500 splunkd started (build f2c836328108)
    splunkd: /home/build/build-src/galaxy/src/pipeline/indexer/IndexerService.cpp:921: void IndexerService::disableIndexesAndReinitGlobalConfig(const const_iterator&, const const_iterator&): Assertion `0 && "Cannot disable indexes on a clustering slave."' failed.
    2016-05-21 21:42:25.272 -0500 splunkd started (build f2c836328108)
    splunkd: /home/build/build-src/galaxy/src/pipeline/indexer/IndexerService.cpp:921: void IndexerService::disableIndexesAndReinitGlobalConfig(const const_iterator&, const const_iterator&): Assertion `0 && "Cannot disable indexes on a clustering slave."' failed.

/etc/redhat-release: Red Hat Enterprise Linux Server release 7.2 (Maipo)
glibc version: 2.17
glibc release: stable
Last errno: 2
Threads running: 23
Runtime: 2.965932s
argv: [splunkd -p 8089 start]
Thread: "SplunkdSpecificInitThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f3190276410:
00000000  00 f7 5f 8e 31 7f 00 00                           |.._.1...|
00000008

InThreadActor @0x7f318e5feaa0: _queuedOn=(nil), ran=N, wantWake=Y, wantFailIfLoopDone=N
First 128 bytes of InThreadActor object @0x7f318e5feaa0:
00000000  f8 78 17 98 31 7f 00 00  01 00 00 8e 31 7f 00 00  |.x..1.......1...|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 a0 e4 8d 31 7f 00 00  00 f0 e4 8d 31 7f 00 00  |....1.......1...|
00000060  e0 eb 5f 8e 31 7f 00 00  95 9a e4 72 7f 83 d3 1c  |.._.1......r....|
00000070  50 9b 04 96 31 7f 00 00  50 eb 5f 8e 31 7f 00 00  |P...1...P._.1...|
00000080


x86 CPUID registers:
         0: 0000000F 756E6547 6C65746E 49656E69
         1: 000306F2 03020800 9ED83203 1FABFBFF
         2: 76036301 00F0B5FF 00000000 00C10000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000075 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300401 0000007F 00000000 00000000
         B: 00000000 00000000 000000CD 00000003
         C: 00000000 00000000 00000000 00000000
         😧 00000000 00000000 00000000 00000000
         E: 00000000 00000000 00000000 00000000
         F: 00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000001 28100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30333632 20337620
  80000004: 2E322040 48473034 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 00003028 00000000 00000000 00000000
terminating...

acharlieh · ‎01-17-2017

It's funny getting the notification for this today. I actually just ran into the same crash myself recently. I have a support case of 440220, which resulted in enhancement request of ENH-6091. If you have a support account and want to be notified of this you can log a case to be added to the CC list of these.

But in $SPLUNK_HOME/var/log/splunk/splunkd.log (or one of the rolled copies if it's been a while, timestamp just before my crash I saw messages like this):

01-10-2017 17:30:01.936 -0600 ERROR DatabaseDirectoryManager - idx=idxname bucket=db_1484082640_1483977006_1_{guid} Detected directory manually copied into its database, causing id conflicts [path1='{idx:homePath}/db_1484082715_1483977061_1_{guid}' path2='/{idx:homePath}/db_1484082640_1483977006_1_{guid}'].
01-10-2017 17:30:01.936 -0600 ERROR IndexerService - Error intializing IndexerService: idx=idxname bucket=db_1484082640_1483977006_1_{guid} Detected directory manually copied into its database, causing id conflicts [path1='/{idx:homePath}/db_1484082715_1483977061_1_{guid}' path2='/{idx:homePath}/db_1484082640_1483977006_1_{guid}'].

After fixing the conflicting buckets, (I had to do a couple rounds, as it only reported a single pair of buckets each crash), but I was able to start successfully myself as @kiran331 mentioned

MichaelRye · ‎01-17-2017

I did finally manage to find the offending bucket(s). After removing them that were manually copied in, startup works now and we're back up and running. Thank you!

acharlieh · ‎05-22-2016

If you have a support contract I would definitely log a case for this. I should be able to disable indexes across an entire cluster without a crash. (Disabling on an individual slave should not happen, but the ideal case would be not to crash when detecting this state, but failing more gracefully.

katanguriabhi · ‎10-22-2016

@kiran331 what is the solution for this issue

kiran331 · ‎10-24-2016

In the Crash.log I saw the replicated Bucket is causing errors, I removed the bucket and splunk service is started.

MichaelRye · ‎01-17-2017

I have the same problem here on one of my indexers, but I do not see a bucket name or ID. Where does the crash log show the bucket?

katanguriabhi · ‎10-24-2016

I did the same but it is not coming up, i just don't know what else might be the problem.

kiran331 · ‎10-24-2016

Ok. Better to file a case with Support.

Richfez · ‎05-22-2016

It appears you have a config in place that attempts to disable indexes on a clustered slave.

I would check what changes have taken place in your configs between the last restart of Splunk and this most recent one. A review those changes will probably point out where it's being disabled from.

Why is Splunk crashing whenever I try to start the splunkd service?

Index This | I’m short for "configuration file.” What am I?

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Your Guide to SPL2 at .conf24!