This crash is happening every time I try to start splunkd after a new install of splunk 5.0.2 build 149561 on SLES11 SP1 x86_64. However, this same build works fine on SLES11 SP2. I have all the latest kernel updates installed on SP1, by the way.
[build 149561] 2013-02-17 19:24:28 Received fatal signal 6 (Aborted). Cause: Signal sent by PID 41950 running under UID 0. Crashing thread: MainTailingThread Registers: RIP: [0x00007FBF56AD5945] gsignal + 53 (/lib64/libc.so.6) RDI: [0x000000000000A3DE] RSI: [0x000000000000A40B] RBP: [0x0000000001306DE0] RSP: [0x00007FBF4AFFE2B8] RAX: [0x0000000000000000] RBX: [0x00007FBF56BC55E0] RCX: [0xFFFFFFFFFFFFFFFF] RDX: [0x0000000000000006] R8: [0x00000000FFFFFFFF] R9: [0x00007FBF56DFCE20] R10: [0x0000000000000008] R11: [0x0000000000000206] R12: [0x00007FFFCC3AC69B] R13: [0x00007FBF56BC55E0] R14: [0x0000000001307930] R15: [0x00000000000000E5] EFL: [0x0000000000000206] TRAPNO: [0x0000000000000000] ERR: [0x0000000000000000] CSGSFS: [0x0000000000000033] OLDMASK: [0x0000000000000000] OS: Linux Arch: x86-64 Backtrace: [0x00007FBF56AD5945] gsignal + 53 (/lib64/libc.so.6) [0x00007FBF56AD6F21] abort + 385 (/lib64/libc.so.6) [0x00007FBF56ACE810] __assert_fail + 240 (/lib64/libc.so.6) [0x00000000006FCD42] _ZN16FileInputTracker10computeCRCEPm14FileDescriptorRK3Strll + 1906 (splunkd) [0x00000000006FCE71] _ZN16FileInputTracker11fileHalfMd5EPm14FileDescriptorRK3Strll + 17 (splunkd) [0x000000000071B844] _ZN3WTF13loadFishStateEb + 644 (splunkd) [0x000000000070A6C5] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcher + 149 (splunkd) [0x000000000070A8E4] _ZN11TailWatcher8readFileER15WatchedTailFile + 260 (splunkd) [0x000000000070C9FB] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 363 (splunkd) [0x0000000000D3F4E1] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 113 (splunkd) [0x0000000000D40DCF] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 479 (splunkd) [0x0000000000DA5553] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 227 (splunkd) [0x0000000000D3A318] _ZN9EventLoop3runEv + 216 (splunkd) [0x000000000071328F] _ZN11TailWatcher3runEv + 143 (splunkd) [0x00000000007133EB] _ZN13TailingThread4mainEv + 267 (splunkd) [0x0000000000DA2F32] _ZN6Thread8callMainEPv + 66 (splunkd) [0x00007FBF58269696] ? (/lib64/libpthread.so.0) [0x00007FBF56B77D7D] clone + 109 (/lib64/libc.so.6) Linux / dl380-ion1 / 22.214.171.124-0.7-default / #1 SMP Fri Dec 28 20:16:13 zzz 2012 / x86_64 Last few lines of stderr (may contain info on assertion failure, but also could be old): 2013-02-17 19:24:26.776 +0000 splunkd started (build 149561) splunkd: /opt/splunk/p4/splunk/branches/5.0.2/src/pipeline/input/FileInputTracker.cpp:229: static bool FileInputTracker::computeCRC(uint64_t*, FileDescriptor, const Str&, file_offset_t, file_offset_t): Assertion `bytesToHash < 1048576' failed. /etc/SuSE-release: SUSE Linux Enterprise Server 11 (x86_64) glibc version: 2.11.1 glibc release: stable Threads running: 24 argv: [splunkd -p 8089 start] terminating...
This is a known issue, bug SPL-58292
MainTailingThread crashes splunkd with a message that says 'Assertion failed: bytesToHash < 1048576' (SPL-58292)
Whenever I hit a new problem I always hit the known issues first to be safe;
In the meantime, you should contact Splunk support for more help
Hi Drainy, Thanks for your reply. Yes, I know there's a similar issue (and I should have stated that in my posting), but sometimes it helps the engineers debug when they have more than one datapoint, hence my report here on SLES11 SP1. Also, I don't have a support contract (yet), so there's no way to contact Splunk support. This is my first day using the evaluation software. Glad it works on SP2 tho.
Even without a support contract you can still submit a support request via https://www.splunk.com/index.php/submit_issue The only difference is that you have no guaranteed SLA but someone at Splunk will eventually read it. Best bet is to include the bug detail in the subject to grab their attention
Possible workaround for this bug:
Check your props.conf files on the instance that is crashing due to this bug, and if there are any 'CHECKMETHOD = modtime or entiremd5, comment them out and restart the instance. Be sure to check under the app contexts as well. A customer found one under the *nix app, and that was the only occurrence he found (not under the default, that is). After commenting it out, it started up as expected.
(The fix for SPL-58292 is expected to come in an upcoming maintenance release)