Hello,
We have a splunk instance that was running fine for a month or two, after which it suddenly had a MongoDB issue (see below) and then shut down on its own after a few days.
What could cause this type of issue to appear? The error mentioned that it ran out of system memory, but the server has 12GB of memory, and MongoDB is currently only using around 5%, so it's not like I'm pushing the VM to its limits on a regular basis.
2017-06-25T12:31:24.954Z F STORAGE [conn15] MongoDB has exhausted the system memory capacity.
2017-06-25T12:31:24.954Z F STORAGE [conn15] Current Memory Status: { page_faults: 100939274, usagePageFileMB: 143, totalPageFileMB: 30719, availPageFileMB: 33, ramMB: 12287 }
2017-06-25T12:31:24.954Z F STORAGE [conn15] VirtualProtect for E:/SplunkData/kvstore/mongo/local.1 chunk 4101 failed with errno:1455 The paging file is too small for this operation to complete. (chunk size is 67108864, address is 4014000000) in mongo::makeChunkWritable, terminating
2017-06-25T12:31:24.954Z I - [conn15] Fatal Assertion 16362
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x146b13
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0xfe14f
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0xf0847
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe ???
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x450e38
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x10a6f3
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x1670f1
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x47fa0b
2017-06-25T12:31:25.126Z I CONTROL [conn15] mongod.exe index_collator_extension+0x47fbb2
2017-06-25T12:31:25.126Z I CONTROL [conn15] KERNEL32.DLL BaseThreadInitThunk+0x22
2017-06-25T12:31:25.126Z I CONTROL [conn15]
2017-06-25T12:31:25.126Z I - [conn15]
***aborting after fassert() failure
Here is the crash file that is associated with the MongoDB issue:
[build aeae3fe0c5af] 2017-06-25 19:12:01
C++ exception: object@[0x0000009320BAD2A0], type@[0x00007FF6E3D4C908]
Exception is Non-continuable
Exception address: [0x00007FFF975E8A5C]
Crashing thread: ResourceUsage
MxCsr: [0x0000000000001FA0]
SegDs: [0x000000000000002B]
SegEs: [0x000000000000002B]
SegFs: [0x0000000000000053]
SegGs: [0x000000000000002B]
SegSs: [0x000000000000002B]
SegCs: [0x0000000000000033]
EFlags: [0x0000000000000202]
Rsp: [0x0000009320BAD120]
Rip: [0x00007FFF975E8A5C] ?
Dr0: [0x3120303038343232]
Dr1: [0x313031353631403B]
Dr2: [0x3B30203030383431]
Dr3: [0x3736303335363140]
Dr6: [0x403B312030303434]
Dr7: [0x3936313234353631]
Rax: [0x31403B3020303032]
Rcx: [0x3838323732363536]
Rdx: [0x3631403B31203030]
Rbx: [0x000000000000C807]
Rbp: [0x0000009320BAD260]
Rsi: [0x00007FF6E3D4C908]
Rdi: [0x0000009320BAD440]
R8: [0x3532363631403B30]
R9: [0x3120303030383236]
R10: [0x353633363631403B]
R11: [0x3B30203030303831]
R12: [0x0000009320E73860]
R13: [0x0000009320BAD8C0]
R14: [0x0000009320BAD2A0]
R15: [0x0000000000000000]
DebugControl: [0x0000000000000000]
LastBranchToRip: [0x000000001B08DDCE]
LastBranchFromRip: [0x00007FFF9A2E0D07]
LastExceptionToRip: [0x0000000000000001]
LastExceptionFromRip: [0x000000931F394A30]
OS: Windows
Arch: x86-64
Backtrace:
[0x00007FFF975E8A5C] ?
Args: [0x000000000000C807] [0x0000009320BAD280] [0x0000000000000001]
[0x00007FFF94394462] ?
Args: [0x00007FF6E1E10000] [0x000000000000C807] [0x000000000000063F]
[0x00007FF6E31EEBB3] ?
Args: [0x000000000000C807] [0x0000000000000000] [0x0000000000000000]
[0x00007FF6E31ED56D] ?
Args: [0x000000931F9E893C] [0x0000009320BAD8C0] [0x0000009320E73860]
[0x00007FF6E1F2D4D8] ?
Args: [0x000000000000000E] [0x000000931F9E8920] [0x0000009320F7CF00]
[0x00007FF6E1F2BE5B] ?
Args: [0x0000009320BAD3F8] [0x000000000000063F] [0x0000009320BAD660]
[0x00007FF6E1F43650] ?
Args: [0x0000009320BAD660] [0x000000931F9DAD30] [0x0000009320BAD660]
[0x00007FF6E27A7B8D] ?
Args: [0x0000009320BAD5D8] [0x0000009320BAE088] [0x000000931F9DAD30]
[0x00007FF6E27A7A8B] ?
Args: [0x0000009320BAE088] [0x000000931F9DAD30] [0x0000009320D3E840]
[0x00007FF6E2523C8C] ?
Args: [0x0000009320BAD8C0] [0x0000009320BADAC0] [0x0000009320BADAC0]
[0x00007FF6E2521A5B] ?
Args: [0x0000009320BADAE0] [0x0000009320BAE370] [0x0000009320BADA80]
[0x00007FF6E2B7B0F7] ?
Args: [0x0000009320BAE490] [0x00007FF6E3765CF8] [0x00000000000017B0]
[0x00007FF6E2B7A993] ?
Args: [0x0000009320BAE920] [0x0000000000000000] [0x0000009320E0DAF0]
[0x00007FF6E2B7242B] ?
Args: [0x0000009320E730E0] [0x0000009320059340] [0x000000000000001C]
[0x00007FF6E2B70F72] ?
Args: [0x000000931FA37FE0] [0x0000009320059210] [0x0000000000030F5E]
[0x00007FF6E2B7BAC6] ?
Args: [0x0000009320059210] [0x0000009320059210] [0x0000000000000000]
[0x00007FF6E2B6E579] ?
Args: [0x00000000000003F8] [0x01D2EE0873D81DC8] [0x01D2EE086E7AD27A]
[0x00007FF6E277B76A] ?
Args: [0x0000009320059210] [0x00007FFF7E9EBDC0] [0x00007FFF7E9EBDC0]
[0x00007FFF7E9EBE1D] ?
Args: [0x00007FFF7E9EBDC0] [0x00007FFF7E9EBDC0] [0x00000093200267B0]
[0x0000009320059210] ?
Args: [0x00007FFF7E9EBDC0] [0x00000093200267B0] [0x00000093200267B0]
[0x00007FFF7E9EBDC0] ?
Args: [0x00000093200267B0] [0x00000093200267B0] [0x00007FFF985113D2]
[0x00007FFF7E9EBDC0] ?
Args: [0x00000093200267B0] [0x00007FFF985113D2] [0x00007FFF7E9EBDC0]
[0x00000093200267B0] ?
Args: [0x00007FFF985113D2] [0x00007FFF7E9EBDC0] [0x00000093200267B0]
[0x00000093200267B0] ?
Args: [0x00007FFF7E9EBDC0] [0x00000093200267B0] [0x0000000000000000]
[0x00007FFF985113D2] ?
Args: [0x00007FFF985113B0] [0x0000000000000000] [0x0000000000000000]
[0x00007FFF9A2C54E4] ?
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
Splunk ran as local administrator
xxxxxxxxxx /Windows Server 2012 R2
C++ Exception type: .?AVbad_alloc@std@@ ->
what(): bad allocation
GetLastError(): 8
Threads running: 7
Executable module base: 0x00007FF6E1E10000
Runtime: 2005482.564115s
argv: [C:\Program Files\Splunk/bin/splunkd instrument-resource-usage -p 8089 --with-kvstore]
Thread: "ResourceUsage", did_join=0, ready_to_run=Y, main_thread=N
First 4 bytes of Thread token @0x9320059224:
00000000 a0 15 00 00 |....|
00000004
x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000306F0 00010800 FEFA3203 0FABFBFF
2: 76036301 00F0B5FF 00000000 00C10000
3: 00000000 00000000 00000000 00000000
4: 00000121 01C0003F 0000003F 00000000
5: 00000000 00000000 00000000 00000000
6: 00000077 00000002 00000009 00000000
7: 00000000 00000281 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000001 00000100 00000000
C: 00000000 00000000 00000000 00000000
😧 00000007 00000340 00000340 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 30363632 20337620
80000004: 2E322040 48473036 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003028 00000000 00000000 00000000
terminating...
Your kvstore might be full, try increasing the oplogsize in server.conf under [kvstore]. Default is 1GB -- mine is set to 2GB.
@coltwanger @lacrosse1991 @Upas02 - is changing the size the right approach? is there a way to archive/put retention on kvstore/mongodb?
Your kvstore might be full, try increasing the oplogsize in server.conf under [kvstore]. Default is 1GB -- mine is set to 2GB.
thanks! going to give this a try
I am facing the same issue.
@lacrosse1991
Can you please tell me how your problem was solved ?
@coltwanger
Thanks, changing oplogsize in server.conf under [kvstore] solved the problem.