Getting Data In

How to resolve error "BTree::Exception: Node::readLE failure" on a Splunk forwarder?

rgsage
Path Finder

One of our Splunk forwarders has stopped forwarding anything to the Indexer. End of /opt/splunkforwarder/var/log/splunk/splunkd.log looks like this (after restart):

...
08-17-2016 16:25:09.384 -0700 INFO  TailingProcessor - Adding watch on path: /var/c3d/logs/prod/tunnelconnect.log.
08-17-2016 16:25:09.384 -0700 INFO  BatchReader - State transitioning from 2 to 0 (initOrResume).
08-17-2016 16:25:09.386 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=4080 Success
08-17-2016 16:25:09.386 -0700 ERROR TailingProcessor - Ignoring path="/opt/splunkforwarder/etc/splunk.version" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 4112 order: 255 keys: { } children: { }
08-17-2016 16:25:09.389 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:09.389 -0700 ERROR TailingProcessor - Ignoring path="/var/c3d/logs/dev3/install.log" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children: { }
08-17-2016 16:25:09.436 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:09.436 -0700 ERROR TailingProcessor - Ignoring path="/opt/splunkforwarder/var/log/splunk/metrics.log.3" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children:
 { }
...
[many lines like this!]
...
08-17-2016 16:25:11.634 -0700 ERROR BTree - unable to reader 4088 bytes: bytes=0 Success
08-17-2016 16:25:11.634 -0700 ERROR TailingProcessor - Ignoring path="[one of our log file path here]" due to: BTree::Exception: Node::readLE failure in Node::Node(1) node offset: 8200 order: 255 keys: { } children: { }
08-17-2016 16:26:09.333 -0700 INFO  TcpOutputProc - Connected to idx=[removed] using ACK.
0 Karma
1 Solution

eryder_splunk
Splunk Employee
Splunk Employee

This seems to be related to things like moving indexes around or hardware/filesystem issues. If this is a Linux-based forwarder, there is a good chance that you need to increase the open file descriptors from the default of 1024 to perhaps 4096 and also change the coredump(blocks) to unlimited.

View solution in original post

dougsearcy
Splunk Employee
Splunk Employee

I had the same issues and resolved it by fixing ownership of splunk_private_db.old. chown -R splunk:splunk /opt/splunk.........

Branden
Builder

This resolution worked for me. I noticed that splunk_private_db.old was owned by root, so I simply chown'd it. Fixed the problem!

0 Karma

eryder_splunk
Splunk Employee
Splunk Employee

This seems to be related to things like moving indexes around or hardware/filesystem issues. If this is a Linux-based forwarder, there is a good chance that you need to increase the open file descriptors from the default of 1024 to perhaps 4096 and also change the coredump(blocks) to unlimited.

rgsage
Path Finder

FWIW this solution also solved a similar problem we had with different symptom. After doing some (extensive) crash testing on one of our systems with a Splunk forwarder, the forwarder started leaving crash logs (e.g. [splunkforwarder]/var/logs/splunk/crash-2016-09-27-21:25:27.log) like this:

...

 Backtrace:
  [0x00007F1C68F88CC9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F8C0D8] abort + 328 (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F81B86] ? (/lib/x86_64-linux-gnu/libc.so.6)
  [0x00007F1C68F81C32] ? (/lib/x86_64-linux-gnu/libc.so.6)
  [0x0000000000E8440D] _ZN2bt5BTree4Node6readLEERK14FileDescriptor + 525 (splunkd)
  [0x0000000000E848B1] _ZN2bt5BTree4NodeC2ERK14FileDescriptorPKNS0_3KeyEj + 97 (splunkd)
  [0x0000000000E84EC4] _ZNK2bt5BTree4Node6acceptERK14FileDescriptorS4_RNS0_7VisitorEj + 148 (splunkd)
  [0x0000000000E82BC2] _ZNK2bt5BTree10checkValidERNS0_6RecordE + 130 (splunkd)
  [0x0000000000E8833B] _ZN2bt7BTreeCP10checkpointEv + 27 (splunkd)
  [0x0000000000E8A07A] _ZN2bt7BTreeCP9addUpdateERKNS_5BTree3KeyERKNS1_6RecordE + 458 (splunkd)
  [0x0000000000C3C4E6] _ZN11FileTracker9addUpdateERK5CRC_tRK10FishRecordb + 214 (splunkd)
  [0x000000000099A7D8] _ZN3WTF11resetOffsetEv + 280 (splunkd)
  [0x000000000099CF42] _ZN3WTF13loadFishStateEb + 1058 (splunkd)
  [0x000000000098895F] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcherP11BatchReader + 191 (splunkd)
  [0x0000000000988B7F] _ZN11TailWatcher8readFileER15WatchedTailFile + 127 (splunkd)
  [0x000000000098ADFC] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 700 (splunkd)
  [0x0000000000EB8AA2] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
  [0x0000000000EBA430] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
  [0x0000000000F49B5D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
  [0x0000000000EB3CB8] _ZN9EventLoop3runEv + 744 (splunkd)
  [0x000000000098916D] _ZN11TailWatcher3runEv + 141 (splunkd)
  [0x000000000098E91A] _ZN13TailingThread4mainEv + 154 (splunkd)
  [0x0000000000F4768E] _ZN6Thread8callMainEPv + 62 (splunkd)
  [0x00007F1C6931F182] ? (/lib/x86_64-linux-gnu/libpthread.so.0)
  [0x00007F1C6904C47D] clone + 109 (/lib/x86_64-linux-gnu/libc.so.6)
...

The suggested btprobe cmd fixed this too.

0 Karma

rgsage
Path Finder

This sounds reasonable, as we do currently have

$ulimit -n
1024

eryder suggested in direct email to us to run:

$ ./bin/splunk cmd btprobe -d [your path to...]/fishbucket/splunk_private_db -r

This appears to have fixed the problem. Errors no longer seen after Splunk restart.
Thanks!

ddrillic
Ultra Champion
0 Karma

rgsage
Path Finder

Thanks I saw that issue too, but it wasn't clear to me what their solution was. Also we have different concerns, since this error is on one of our forwarders, not the indexer. We don't have backups of the forwarder filesystem.

I was hoping since this a forwarder the solution might be easier. For example re-indexing some of the data from the forwarder would be acceptable...

0 Karma

ddrillic
Ultra Champion

Really interesting because I only found this single thread with reference to an error with BTree.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...