I have a Splunk 6.4 environment with 3-member SH Cluster running kvstore without replication to the indexer tier.
The kvstore is not particularly heavily utilised, with only three user-defined collections. The biggest of these is a table with ~130,000 rows, while the other two are both <30,000 rows.
(the Cluster also runs Enterprise Security with some vendor apps installed for good measure. Between them, these also defined some collections, but their contribution is negligible - fewer than 10,000 rows in total)
All three lookups operate as state tables - they are frequently updated, with new data being written and old existing data deleted from them, and I suspect this could be a cause of the problem I'm seeing, which is the total size of the MongoDB files in,
SH1 - 13GB
SH2 - 2.8GB
SH3 - 12GB
The 2.8GB on SH2 looks almost plausible for the amount of data I have in my lookups, but the >10GB sizes on the other two SH's.. no way.
Checking operation of kvstore on each SHC member using,
curl -k -u https://localhost:8089/services/server/introspection/kvstore/serverstatus
returns (albeit fairly incomprehensible) introspection data, so the kvstore on the two bloated SHC Members shouldn't be stale and wouldn't benefit from a resync... or would it?
Does DMC's KVStore view show no error or warnings?
Probably SH2 is out-of-sync.
Hope you're using v6.5.0 which added easier procedure to check status and resync kvstore.
There's no DMC visibility of this SHC because of reasons, and the environment is still on 6.4, so no
./splunk show kvstore-status
Doing a kvstore clean on SH2 on advice of Splunk Support did... something. As of right now, the sizes of /var/lib/splunk/kvstore/mongo are,
SH1 - 8.2GB
SH2 - 6.0GB
SH3 - 6.6GB
The 35% range in sizes is one thing, but the sheer size of the kvstore is probably a bigger worry, as I'm starting to suspect that when SPL is used to modify kvstore lookups via the lookup/inputlookup/outputlookup command, MongoDB doesn't get the memo on delete operations, and that's why the kvstore keeps bloating.
The three kvstore collections I use for analytics operate as state tables - high-frequency scheduled searches append new and existing rows in the tables, while a lower frequency search comes along every so often and prunes old rows so the table sizes remain manageable - meaning that while the size of table as it appears to Splunk (ie. what you'll get if you just run | inputlookup ) remains more or less constant as new data replaces expired old data, the underlying kvstore collection might just keep growing in size because nothing tells MongoDB to actually delete the data which the lookup table no longer cares about.
To check this, a couple of days ago - after the kvstore clean on SH2 - I manually deleted a big portion (50%) of the biggest of my lookups, and the size of /kvstore/mongo didn't twitch.
At this point, I'd be happy with just a straight answer as to whether MongoDB actually deletes data from collections when it is removed lookups via SPL commands.
Update: A sneaky kvstore clean on SH1 piggybacked onto a bundle deployment has resulted in the kvstore sizes now being,
which is almost bearable - although even a 10% difference in size is a bit discomfort-inducing - and one wonders what would happen if a clean was carried out on SH3.
This really just leaves the question of why it's so big to be answered...