Deployment Architecture

Splunk 6.3.0 crashes on clustered indexers when some indexes are set to read-only

frankwayne
Path Finder

I deprecated certain indexes on my indexer cluster and wanted to set them to read-only (using isReadOnly = true) to avoid receiving any new events inadvertently. When the bundle is rolled out, each peer restarts like it should, but shortly after maintenance mode is stopped, splunkd crashes on all the peers except one.

The crash report indicates an assert failure in DatabaseDirectoryManager::addReplicatedBucket(), "addReplicatedBucket request for readonly instance" (see below). I can't find any documentation indicating that one may not set a replicated index to read-only. Can anyone clarify?

[build aa7d4b1ccb80] 2015-12-04 10:38:48
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 63141 running under UID 0.
 Crashing thread: ReplicationDataReceiverThread
 Registers:
    RIP:  [0x00007F1BA20305D7] gsignal + 55 (/lib64/libc.so.6)
    RDI:  [0x000000000000F6A5]
    RSI:  [0x000000000000F706]
    RBP:  [0x00007F1BA2179128]
    RSP:  [0x00007F1B91BFC328]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007F1BA357E000]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0xFEFEFEFEFEFEFEFF]
    R9:  [0x00007F1BA35C6F60]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000206]
    R12:  [0x0000000001891C20]
    R13:  [0x0000000001892C60]
    R14:  [0x0000000000000000]
    R15:  [0x00007F1B90A82458]
    EFL:  [0x0000000000000206]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace:
  [0x00007F1BA20305D7] gsignal + 55 (/lib64/libc.so.6)
  [0x00007F1BA2031CC8] abort + 328 (/lib64/libc.so.6)
  [0x00007F1BA2029546] ? (/lib64/libc.so.6)
  [0x00007F1BA20295F2] ? (/lib64/libc.so.6)
  [0x0000000000AD9864] _ZN24DatabaseDirectoryManager19addReplicatedBucketERK10CMBucketIdRK8PathnameR3Str + 388 (splunkd)
  [0x0000000000B132D9] _ZN7Indexer19addReplicatedBucketERK10CMBucketIdRK8PathnamebP3Str + 153 (splunkd)
  [0x0000000000A4135D] _ZN22S2SJournalFileReceiver12onFileClosedEPN21S2SBucketFileReceiver8FileInfoE + 973 (splunkd)
  [0x0000000000A3F5C9] _ZN21S2SBucketFileReceiver16processFileSliceERK15CowPipelineDataRK4GuidR3Str + 921 (splunkd)
  [0x0000000000A28E9A] _ZN22ReplicationDataChannel16s2sDataAvailableER15CowPipelineDataRK15S2SPerEventInfom + 1226 (splunkd)
  [0x0000000000F479A9] _ZN11S2SReceiver8gotEventER15CowPipelineDataRK15S2SPerEventInfo + 137 (splunkd)
  [0x0000000000E4FEC3] _ZN18StreamingS2SParser5parseEPKcS1_ + 8067 (splunkd)
  [0x0000000000A22E30] _ZN16CookedTcpChannel7consumeER18TcpAsyncDataBuffer + 144 (splunkd)
  [0x0000000000A23EB9] _ZN22ReplicationDataChannel7consumeER18TcpAsyncDataBuffer + 25 (splunkd)
  [0x0000000000A235F4] _ZN16CookedTcpChannel13dataAvailableER18TcpAsyncDataBuffer + 52 (splunkd)
  [0x000000000109A34E] _ZN10TcpChannel11when_eventsE18PollableDescriptor + 702 (splunkd)
  [0x000000000100B535] _ZN8PolledFd8do_eventEv + 197 (splunkd)
  [0x000000000100C2D7] _ZN9EventLoop3runEv + 1351 (splunkd)
  [0x00000000010955F4] _ZN19Base_TcpChannelLoop7_do_runEv + 36 (splunkd)
  [0x00000000010956A6] _ZN25SubordinateTcpChannelLoop3runEv + 134 (splunkd)
  [0x000000000109F0EE] _ZN6Thread8callMainEPv + 62 (splunkd)
  [0x00007F1BA23C3DF5] ? (/lib64/libpthread.so.0)
  [0x00007F1BA20F11AD] clone + 109 (/lib64/libc.so.6)
 Linux / <server-name> / 3.10.0-229.14.1.el7.x86_64 / #1 SMP Tue Aug 25 11:21:22 EDT 2015 / x86_64
 Last few lines of stderr (may contain info on assertion failure, but also could be old):
    Cannot open manifest file inside "<path>/db_1449246603_1449246602_17_88D6F8A1-D2CE-4CF0-8DF3-937D03D30A5E/rawdata": No such file or directory
    Cannot open manifest file inside "<path>/db_1334071435_1334070628_18_88D6F8A1-D2CE-4CF0-8DF3-937D03D30A5E/rawdata": No such file or directory
    Cannot open manifest file inside "<path>/db_1449247040_1449247027_88_88D6F8A1-D2CE-4CF0-8DF3-937D03D30A5E/rawdata": No such file or directory
    splunkd: /home/build/build-src/ember/src/pipeline/indexer/DatabaseDirectoryManager.cpp:2429: bool DatabaseDirectoryManager::addReplicatedBucket(const CMBucketId&, const Pathname&, Str&): Assertion `0 && "addReplicatedBucket request for readonly instance"' failed.
1 Solution

lguinn2
Legend

Once a cluster exits maintenance mode, it begins to replicate buckets. (In fact, it may have a backlog of buckets to replicate.)
Clearly, bucket replication is failing for read-only indexes.

This should be reported to Splunk Support so it can be investigated. Based on this limited information, I would say:

"You can't use isReadOnly = true in on a replicated index. Either
(1) The documentation needs to be updated to indicate that fact OR
(2) There is a bug that needs to be filed and fixed."

View solution in original post

lguinn2
Legend

Once a cluster exits maintenance mode, it begins to replicate buckets. (In fact, it may have a backlog of buckets to replicate.)
Clearly, bucket replication is failing for read-only indexes.

This should be reported to Splunk Support so it can be investigated. Based on this limited information, I would say:

"You can't use isReadOnly = true in on a replicated index. Either
(1) The documentation needs to be updated to indicate that fact OR
(2) There is a bug that needs to be filed and fixed."

acharlieh
Influencer

We just ran into this on Splunk 6.2.5

0 Karma

RishiMandal
Explorer

We ran into same issue where one of the indexer from cluster is crashing again , again and again. It was pointing to my uid, when all I did was logging in and running splunk start to validate on what we find in crash logs. We did splunk reinstall, turning off other services on the box, splunk support etc. but we are yet to find resolution for the same. If you find anything please post it , so that we can use the same

0 Karma
Get Updates on the Splunk Community!

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more with ITSI’s ...

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more! Faster Time to ValueManaging and ...

New Release | Splunk Enterprise 9.3

Admins and Analyst can benefit from:  Seamlessly route data to your local file system to save on storage ...

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...