Knowledge Management

KV Store: Fatal Assertion - Write to OpLog failed

Contributor

We are running Splunk 6.5.1, and on one of our standalone search heads, upon every restart of splunkd we get the following message:

KV Store changed status to failed. KVStore process terminated.
KV Store process terminated abnormally (exit code 14, status exited with code 14). See mongod.log and splunkd.log for details.

And further, in mongod.log:

 2017-01-27T06:57:06.836Z I NETWORK  FIPS 140-2 mode activated
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] MongoDB starting : pid=1852 port=8191 dbpath=C:\Program Files\Splunk\var\lib\splunk\kvstore\mongo 64-bit host=FS300
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] db version v3.0.8-splunk
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] git version: 83d8cc25e00e42856924d84e220fbe4a839e605d
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] allocator: tcmalloc
 2017-01-27T06:57:06.894Z I CONTROL  [initandlisten] options: { net: { port: 8191, ssl: { CAFile: "C:\Program Files\Splunk\etc\auth\mycerts\cacert.cer", FIPSMode: true, PEMKeyFile: "C:\Program Files\Splunk\etc\auth\mycerts\server.pem", PEMKeyPassword: "<password>", allowInvalidHostnames: true, mode: "preferSSL" } }, replication: { oplogSizeMB: 200, replSet: "E3DEAFEB-A8FE-4C69-8C24-6BD8AE669663" }, security: { clusterAuthMode: "sendX509", javascriptEnabled: false, keyFile: "C:\Program Files\Splunk\var\lib\splunk\kvstore\mongo\splunk.key" }, setParameter: { enableLocalhostAuthBypass: "0" }, storage: { dbPath: "C:\Program Files\Splunk\var\lib\splunk\kvstore\mongo", mmapv1: { smallFiles: true } }, systemLog: { timeStampFormat: "iso8601-utc" } }
 2017-01-27T06:57:06.920Z I JOURNAL  [initandlisten] journal dir=C:\Program Files\Splunk\var\lib\splunk\kvstore\mongo\journal
 2017-01-27T06:57:06.920Z I JOURNAL  [initandlisten] recover begin
 2017-01-27T06:57:06.920Z I JOURNAL  [initandlisten] recover lsn: 0
 2017-01-27T06:57:06.920Z I JOURNAL  [initandlisten] recover C:\Program Files\Splunk\var\lib\splunk\kvstore\mongo\journal\j._0
 2017-01-27T06:57:06.934Z I JOURNAL  [initandlisten] recover cleaning up
 2017-01-27T06:57:06.934Z I JOURNAL  [initandlisten] removeJournalFiles
 2017-01-27T06:57:06.934Z I JOURNAL  [initandlisten] recover done
 2017-01-27T06:57:06.998Z I JOURNAL  [durability] Durability thread started
 2017-01-27T06:57:06.999Z I JOURNAL  [journal writer] Journal writer thread started
 2017-01-27T06:57:07.507Z I NETWORK  [initandlisten] waiting for connections on port 8191 ssl
 2017-01-27T06:57:07.509Z I REPL     [ReplicationExecutor] New replica set config in use: { _id: "E3DEAFEB-A8FE-4C69-8C24-6BD8AE669663", version: 1, members: [ { _id: 0, host: "127.0.0.1:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { instance: "E3DEAFEB-A8FE-4C69-8C24-6BD8AE669663", all: "all" }, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
 2017-01-27T06:57:07.509Z I REPL     [ReplicationExecutor] This node is 127.0.0.1:8191 in the config
 2017-01-27T06:57:07.509Z I REPL     [ReplicationExecutor] transition to STARTUP2
 2017-01-27T06:57:07.509Z I REPL     [ReplicationExecutor] Starting replication applier threads
 2017-01-27T06:57:07.510Z I REPL     [ReplicationExecutor] transition to RECOVERING
 2017-01-27T06:57:07.511Z I REPL     [ReplicationExecutor] transition to SECONDARY
 2017-01-27T06:57:07.512Z I REPL     [ReplicationExecutor] transition to PRIMARY
 2017-01-27T06:57:07.567Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50653 #1 (1 connection now open)
 2017-01-27T06:57:07.784Z I ACCESS   [conn1]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:07.784Z I NETWORK  [conn1] end connection 127.0.0.1:50653 (0 connections now open)
 2017-01-27T06:57:08.557Z I REPL     [rsSync] transition to primary complete; database writes are now permitted
 2017-01-27T06:57:08.786Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50679 #2 (1 connection now open)
 2017-01-27T06:57:09.027Z I ACCESS   [conn2]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:09.027Z I NETWORK  [conn2] end connection 127.0.0.1:50679 (0 connections now open)
 2017-01-27T06:57:09.028Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50680 #3 (1 connection now open)
 2017-01-27T06:57:09.252Z I ACCESS   [conn3]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:09.254Z I NETWORK  [conn3] end connection 127.0.0.1:50680 (0 connections now open)
 2017-01-27T06:57:09.255Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50681 #4 (1 connection now open)
 2017-01-27T06:57:09.471Z I ACCESS   [conn4]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:09.472Z I NETWORK  [conn4] end connection 127.0.0.1:50681 (0 connections now open)
 2017-01-27T06:57:09.473Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50683 #5 (1 connection now open)
 2017-01-27T06:57:09.693Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50684 #6 (2 connections now open)
 2017-01-27T06:57:09.916Z I ACCESS   [conn6]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:10.759Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50689 #7 (3 connections now open)
 2017-01-27T06:57:10.974Z I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50691 #8 (4 connections now open)
 2017-01-27T06:57:11.189Z I ACCESS   [conn8]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=email@email.com,CN=searchhead.abc.com,OU=OU,O=OrgName,L=City,ST=state,C=US" }
 2017-01-27T06:57:11.204Z I -        [conn6] Assertion: 17322:write to oplog failed: DocTooLargeForCapp
 2017-01-27T06:59:34.918Z I -        [conn6] Assertion: 17322:write to oplog failed: DocTooLargeForCapped document doesn't fit in capped collection. size: 160 storageSize:209715200 @ 28575
 2017-01-27T06:59:34.972Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x1465d3
 2017-01-27T06:59:34.972Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0xfd72f
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0xf05ae
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0xf04d3
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x44fb88
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x10a343
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x166c01
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x47e72b
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x47e8d2
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] KERNEL32.DLL    BaseThreadInitThunk+0x22
 2017-01-27T06:59:34.973Z I CONTROL  [conn6] 
 2017-01-27T06:59:36.029Z W STORAGE  [conn6] couldn't make room for record len: 160 in capped ns local.oplog.rs
 numRecords: 566262
   Extent 0 (capExtent) 1:2000
     magic: 41424344 extent->ns: local.oplog.rs
     fr: null lr: 1:c3fbd58 extent->len: 209715200
 2017-01-27T06:59:36.029Z I -        [conn6] Fatal Assertion 17438
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x1465d3
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0xfd72f
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0xefec7
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      ???
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x44fb88
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x10a343
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x166c01
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x47e72b
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] mongod.exe      index_collator_extension+0x47e8d2
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] KERNEL32.DLL    BaseThreadInitThunk+0x22
 2017-01-27T06:59:36.080Z I CONTROL  [conn6] 
 2017-01-27T06:59:36.080Z I -        [conn6] 
 ***aborting after fassert() failure

It's been condensed a bit, but essentially there's a lot of spamming of the ???, indexcollator and the assertion errors before it finally gives up and shuts down. This appears to have happened after replacing our SSL certs, but we have an ES Search Head with the same config (different cert), and the KVStore is functioning normally.

We aren't actually using KV Store for anything on this Search Head yet, and we don't have a lot of experience with it. But, I'm not afraid to do a complete re initialization if that's what it takes to get it functioning again.

Any ideas on what's causing this and how to resolve it?

0 Karma
1 Solution

Contributor

Increasing [kvstore]/oplogSize in server.conf to 2000MB (default: 1000), and cleaning the kvstore while I was at it resolved this issue.

Once I read the default was 1000, I took a look at my kvstore directory again and saw all of the ".0" files, which I'm assuming is the primary KV mongo db. The size of all these files put together was about 1037MB -- just over the default cap set in server.conf.

I'm pretty confident just increasing the size would resolve it but I wanted to make sure it was done 🙂

More info:

"Prevent stale members by increasing operations log size"
https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/ResyncKVstore

View solution in original post

Contributor

Increasing [kvstore]/oplogSize in server.conf to 2000MB (default: 1000), and cleaning the kvstore while I was at it resolved this issue.

Once I read the default was 1000, I took a look at my kvstore directory again and saw all of the ".0" files, which I'm assuming is the primary KV mongo db. The size of all these files put together was about 1037MB -- just over the default cap set in server.conf.

I'm pretty confident just increasing the size would resolve it but I wanted to make sure it was done 🙂

More info:

"Prevent stale members by increasing operations log size"
https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/ResyncKVstore

View solution in original post

Engager

Resyncing KV Store resolved our issue. Thank you for including that link.

0 Karma