Monitoring Splunk

What could cause this type of MongoDB error

lacrosse1991
Explorer

Hello,

We have a splunk instance that was running fine for a month or two, after which it suddenly had a MongoDB issue (see below) and then shut down on its own after a few days.

What could cause this type of issue to appear? The error mentioned that it ran out of system memory, but the server has 12GB of memory, and MongoDB is currently only using around 5%, so it's not like I'm pushing the VM to its limits on a regular basis.

2017-06-25T12:31:24.954Z F STORAGE  [conn15] MongoDB has exhausted the system memory capacity.
     2017-06-25T12:31:24.954Z F STORAGE  [conn15] Current Memory Status: { page_faults: 100939274, usagePageFileMB: 143, totalPageFileMB: 30719, availPageFileMB: 33, ramMB: 12287 }
     2017-06-25T12:31:24.954Z F STORAGE  [conn15] VirtualProtect for E:/SplunkData/kvstore/mongo/local.1 chunk 4101 failed with errno:1455 The paging file is too small for this operation to complete. (chunk size is 67108864, address is 4014000000) in mongo::makeChunkWritable, terminating
     2017-06-25T12:31:24.954Z I -        [conn15] Fatal Assertion 16362
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x146b13
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0xfe14f
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0xf0847
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      ???
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x450e38
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x10a6f3
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x1670f1
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x47fa0b
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] mongod.exe      index_collator_extension+0x47fbb2
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] KERNEL32.DLL    BaseThreadInitThunk+0x22
     2017-06-25T12:31:25.126Z I CONTROL  [conn15] 
     2017-06-25T12:31:25.126Z I -        [conn15] 
     ***aborting after fassert() failure

Here is the crash file that is associated with the MongoDB issue:

[build aeae3fe0c5af] 2017-06-25 19:12:01
 C++ exception: object@[0x0000009320BAD2A0], type@[0x00007FF6E3D4C908]
 Exception is Non-continuable
 Exception address: [0x00007FFF975E8A5C]
 Crashing thread: ResourceUsage
    MxCsr:  [0x0000000000001FA0]
    SegDs:  [0x000000000000002B]
    SegEs:  [0x000000000000002B]
    SegFs:  [0x0000000000000053]
    SegGs:  [0x000000000000002B]
    SegSs:  [0x000000000000002B]
    SegCs:  [0x0000000000000033]
    EFlags:  [0x0000000000000202]
    Rsp:  [0x0000009320BAD120]
    Rip:  [0x00007FFF975E8A5C] ?
    Dr0:  [0x3120303038343232]
    Dr1:  [0x313031353631403B]
    Dr2:  [0x3B30203030383431]
    Dr3:  [0x3736303335363140]
    Dr6:  [0x403B312030303434]
    Dr7:  [0x3936313234353631]
    Rax:  [0x31403B3020303032]
    Rcx:  [0x3838323732363536]
    Rdx:  [0x3631403B31203030]
    Rbx:  [0x000000000000C807]
    Rbp:  [0x0000009320BAD260]
    Rsi:  [0x00007FF6E3D4C908]
    Rdi:  [0x0000009320BAD440]
    R8:  [0x3532363631403B30]
    R9:  [0x3120303030383236]
    R10:  [0x353633363631403B]
    R11:  [0x3B30203030303831]
    R12:  [0x0000009320E73860]
    R13:  [0x0000009320BAD8C0]
    R14:  [0x0000009320BAD2A0]
    R15:  [0x0000000000000000]
    DebugControl:  [0x0000000000000000]
    LastBranchToRip:  [0x000000001B08DDCE]
    LastBranchFromRip:  [0x00007FFF9A2E0D07]
    LastExceptionToRip:  [0x0000000000000001]
    LastExceptionFromRip:  [0x000000931F394A30]

 OS: Windows
 Arch: x86-64

 Backtrace:
  [0x00007FFF975E8A5C] ?
Args:  [0x000000000000C807]  [0x0000009320BAD280]  [0x0000000000000001]
  [0x00007FFF94394462] ?
Args:  [0x00007FF6E1E10000]  [0x000000000000C807]  [0x000000000000063F]
  [0x00007FF6E31EEBB3] ?
Args:  [0x000000000000C807]  [0x0000000000000000]  [0x0000000000000000]
  [0x00007FF6E31ED56D] ?
Args:  [0x000000931F9E893C]  [0x0000009320BAD8C0]  [0x0000009320E73860]
  [0x00007FF6E1F2D4D8] ?
Args:  [0x000000000000000E]  [0x000000931F9E8920]  [0x0000009320F7CF00]
  [0x00007FF6E1F2BE5B] ?
Args:  [0x0000009320BAD3F8]  [0x000000000000063F]  [0x0000009320BAD660]
  [0x00007FF6E1F43650] ?
Args:  [0x0000009320BAD660]  [0x000000931F9DAD30]  [0x0000009320BAD660]
  [0x00007FF6E27A7B8D] ?
Args:  [0x0000009320BAD5D8]  [0x0000009320BAE088]  [0x000000931F9DAD30]
  [0x00007FF6E27A7A8B] ?
Args:  [0x0000009320BAE088]  [0x000000931F9DAD30]  [0x0000009320D3E840]
  [0x00007FF6E2523C8C] ?
Args:  [0x0000009320BAD8C0]  [0x0000009320BADAC0]  [0x0000009320BADAC0]
  [0x00007FF6E2521A5B] ?
Args:  [0x0000009320BADAE0]  [0x0000009320BAE370]  [0x0000009320BADA80]
  [0x00007FF6E2B7B0F7] ?
Args:  [0x0000009320BAE490]  [0x00007FF6E3765CF8]  [0x00000000000017B0]
  [0x00007FF6E2B7A993] ?
Args:  [0x0000009320BAE920]  [0x0000000000000000]  [0x0000009320E0DAF0]
  [0x00007FF6E2B7242B] ?
Args:  [0x0000009320E730E0]  [0x0000009320059340]  [0x000000000000001C]
  [0x00007FF6E2B70F72] ?
Args:  [0x000000931FA37FE0]  [0x0000009320059210]  [0x0000000000030F5E]
  [0x00007FF6E2B7BAC6] ?
Args:  [0x0000009320059210]  [0x0000009320059210]  [0x0000000000000000]
  [0x00007FF6E2B6E579] ?
Args:  [0x00000000000003F8]  [0x01D2EE0873D81DC8]  [0x01D2EE086E7AD27A]
  [0x00007FF6E277B76A] ?
Args:  [0x0000009320059210]  [0x00007FFF7E9EBDC0]  [0x00007FFF7E9EBDC0]
  [0x00007FFF7E9EBE1D] ?
Args:  [0x00007FFF7E9EBDC0]  [0x00007FFF7E9EBDC0]  [0x00000093200267B0]
  [0x0000009320059210] ?
Args:  [0x00007FFF7E9EBDC0]  [0x00000093200267B0]  [0x00000093200267B0]
  [0x00007FFF7E9EBDC0] ?
Args:  [0x00000093200267B0]  [0x00000093200267B0]  [0x00007FFF985113D2]
  [0x00007FFF7E9EBDC0] ?
Args:  [0x00000093200267B0]  [0x00007FFF985113D2]  [0x00007FFF7E9EBDC0]
  [0x00000093200267B0] ?
Args:  [0x00007FFF985113D2]  [0x00007FFF7E9EBDC0]  [0x00000093200267B0]
  [0x00000093200267B0] ?
Args:  [0x00007FFF7E9EBDC0]  [0x00000093200267B0]  [0x0000000000000000]
  [0x00007FFF985113D2] ?
Args:  [0x00007FFF985113B0]  [0x0000000000000000]  [0x0000000000000000]
  [0x00007FFF9A2C54E4] ?
Args:  [0x0000000000000000]  [0x0000000000000000]  [0x0000000000000000]
Splunk ran as local administrator
xxxxxxxxxx /Windows Server 2012 R2
 C++ Exception type: .?AVbad_alloc@std@@ -> 
 what(): bad allocation
GetLastError(): 8
Threads running: 7
Executable module base: 0x00007FF6E1E10000
Runtime: 2005482.564115s
argv: [C:\Program Files\Splunk/bin/splunkd instrument-resource-usage -p 8089 --with-kvstore]
Thread: "ResourceUsage", did_join=0, ready_to_run=Y, main_thread=N
First 4 bytes of Thread token @0x9320059224:
00000000  a0 15 00 00                                       |....|
00000004


x86 CPUID registers:
         0: 0000000D 756E6547 6C65746E 49656E69
         1: 000306F0 00010800 FEFA3203 0FABFBFF
         2: 76036301 00F0B5FF 00000000 00C10000
         3: 00000000 00000000 00000000 00000000
         4: 00000121 01C0003F 0000003F 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000077 00000002 00000009 00000000
         7: 00000000 00000281 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300401 0000007F 00000000 00000000
         B: 00000000 00000001 00000100 00000000
         C: 00000000 00000000 00000000 00000000
         😧 00000007 00000340 00000340 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000001 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30363632 20337620
  80000004: 2E322040 48473036 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 00003028 00000000 00000000 00000000
terminating...
0 Karma
1 Solution

coltwanger
Contributor

Your kvstore might be full, try increasing the oplogsize in server.conf under [kvstore]. Default is 1GB -- mine is set to 2GB.

View solution in original post

0 Karma

cabauah
Explorer

@coltwanger @lacrosse1991 @Upas02 - is changing the size the right approach? is there a way to archive/put retention on kvstore/mongodb?

0 Karma

coltwanger
Contributor

Your kvstore might be full, try increasing the oplogsize in server.conf under [kvstore]. Default is 1GB -- mine is set to 2GB.

View solution in original post

0 Karma

lacrosse1991
Explorer

thanks! going to give this a try

0 Karma

Upas02
Path Finder

I am facing the same issue.
@lacrosse1991
Can you please tell me how your problem was solved ?

0 Karma

Upas02
Path Finder

@coltwanger
Thanks, changing oplogsize in server.conf under [kvstore] solved the problem.

0 Karma