Solved: Re: How to resolve error "BTree::Exception: Node::...

rgsage · ‎08-17-2016

One of our Splunk forwarders has stopped forwarding anything to the Indexer. End of /opt/splunkforwarder/var/log/splunk/splunkd.log looks like this (after restart):

...
08-17-2016 16:25:09.384 -0700 INFO  TailingProcessor - Adding watch on path: /var/c3d/logs/prod/tunnelconnect.log.
08-17-2016 16:25:09.384 -0700 INFO  BatchReader - State transitioning from 2 to 0 (initOrResume).
08-17-2016 16:25:09.386 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=4080 Success
08-17-2016 16:25:09.386 -0700 ERROR TailingProcessor - Ignoring path="/opt/splunkforwarder/etc/splunk.version" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 4112 order: 255 keys: { } children: { }
08-17-2016 16:25:09.389 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:09.389 -0700 ERROR TailingProcessor - Ignoring path="/var/c3d/logs/dev3/install.log" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children: { }
08-17-2016 16:25:09.436 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:09.436 -0700 ERROR TailingProcessor - Ignoring path="/opt/splunkforwarder/var/log/splunk/metrics.log.3" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children:
 { }
...
[many lines like this!]
...
08-17-2016 16:25:11.634 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:11.634 -0700 ERROR TailingProcessor - Ignoring path="[one of our log file path here]" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children: { }
08-17-2016 16:26:09.333 -0700 INFO  TcpOutputProc - Connected to idx=[removed] using ACK.

eryder_splunk · ‎08-29-2016

This seems to be related to things like moving indexes around or hardware/filesystem issues. If this is a Linux-based forwarder, there is a good chance that you need to increase the open file descriptors from the default of 1024 to perhaps 4096 and also change the coredump(blocks) to unlimited.

View solution in original post

dougsearcy · ‎05-18-2018

I had the same issues and resolved it by fixing ownership of splunk_private_db.old. chown -R splunk:splunk /opt/splunk.........

Branden · ‎04-30-2019

This resolution worked for me. I noticed that splunk_private_db.old was owned by root, so I simply chown'd it. Fixed the problem!

eryder_splunk · ‎08-29-2016

This seems to be related to things like moving indexes around or hardware/filesystem issues. If this is a Linux-based forwarder, there is a good chance that you need to increase the open file descriptors from the default of 1024 to perhaps 4096 and also change the coredump(blocks) to unlimited.

rgsage · ‎09-30-2016

FWIW this solution also solved a similar problem we had with different symptom. After doing some (extensive) crash testing on one of our systems with a Splunk forwarder, the forwarder started leaving crash logs (e.g. [splunkforwarder]/var/logs/splunk/crash-2016-09-27-21:25:27.log) like this:

...

 Backtrace:
  [0x00007F1C68F88CC9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F8C0D8] abort + 328 (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F81B86] ? (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F81C32] ? (/lib/x86_64-linux-gnu/libc.so.6)
  [0x0000000000E8440D] _ZN2bt5BTree4Node6readLEERK14FileDescriptor + 525 (splunkd)
  [0x0000000000E848B1] _ZN2bt5BTree4NodeC2ERK14FileDescriptorPKNS0_3KeyEj + 97 (splunkd)
  [0x0000000000E84EC4] _ZNK2bt5BTree4Node6acceptERK14FileDescriptorS4_RNS0_7VisitorEj + 148 (splunkd)
  [0x0000000000E82BC2] _ZNK2bt5BTree10checkValidERNS0_6RecordE + 130 (splunkd)
  [0x0000000000E8833B] _ZN2bt7BTreeCP10checkpointEv + 27 (splunkd)
  [0x0000000000E8A07A] _ZN2bt7BTreeCP9addUpdateERKNS_5BTree3KeyERKNS1_6RecordE + 458 (splunkd)
  [0x0000000000C3C4E6] _ZN11FileTracker9addUpdateERK5CRC_tRK10FishRecordb + 214 (splunkd)
  [0x000000000099A7D8] _ZN3WTF11resetOffsetEv + 280 (splunkd)
  [0x000000000099CF42] _ZN3WTF13loadFishStateEb + 1058 (splunkd)
  [0x000000000098895F] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcherP11BatchReader + 191 (splunkd)
  [0x0000000000988B7F] _ZN11TailWatcher8readFileER15WatchedTailFile + 127 (splunkd)
  [0x000000000098ADFC] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 700 (splunkd)
  [0x0000000000EB8AA2] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
  [0x0000000000EBA430] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
  [0x0000000000F49B5D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
  [0x0000000000EB3CB8] _ZN9EventLoop3runEv + 744 (splunkd)
  [0x000000000098916D] _ZN11TailWatcher3runEv + 141 (splunkd)
  [0x000000000098E91A] _ZN13TailingThread4mainEv + 154 (splunkd)
  [0x0000000000F4768E] _ZN6Thread8callMainEPv + 62 (splunkd)
  [0x00007F1C6931F182] ? (/lib/x86_64-linux-gnu/libpthread.so.0)
  [0x00007F1C6904C47D] clone + 109 (/lib/x86_64-linux-gnu/libc.so.6)
...

The suggested btprobe cmd fixed this too.

rgsage · ‎08-29-2016

This sounds reasonable, as we do currently have

$ulimit -n
1024

eryder suggested in direct email to us to run:

$ ./bin/splunk cmd btprobe -d [your path to...]/fishbucket/splunk_private_db -r

This appears to have fixed the problem. Errors no longer seen after Splunk restart.
Thanks!

ddrillic · ‎08-17-2016

A similar issue at Indexing not working, how can I correct "BTree child has invalid invalid offset error" in custom ind...

rgsage · ‎08-17-2016

Thanks I saw that issue too, but it wasn't clear to me what their solution was. Also we have different concerns, since this error is on one of our forwarders, not the indexer. We don't have backups of the forwarder filesystem.

I was hoping since this a forwarder the solution might be easier. For example re-indexing some of the data from the forwarder would be acceptable...

ddrillic · ‎08-17-2016

Really interesting because I only found this single thread with reference to an error with BTree.

How to resolve error "BTree::Exception: Node::readLE failure" on a Splunk forwarder?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability

Are you a member of the Splunk Community?

How to resolve error "BTree::Exception: Node::readLE failure" on a Splunk forwarder?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability