Hello Splunkers!!
We are experiencing frequent KV Store crashes, which are causing all reports to stop functioning. The error message observed is:
"[ReplBatcher] out of memory."
This issue is significantly impacting our operations, as many critical reports rely on KV Store for data retrieval and processing. Please help me to get it fix.
Thanks in advance!!
| rest /servicesNS/-/-/storage/collections/data/<collection_name> | stats count
If capped, increase the ulimit (e.g., edit /etc/security/limits.conf to set splunk - memlock unlimited and reboot or reapply). MongoDB (used by KV Store) typically uses up to 50% of system RAM minus 1GB for its working set by default. With 64GB, it should have ~31GB available, ensure it’s not artificially limited.
[kvstore] oplogSize = <current value>
oplogSize = <integer>
* The size of the replication operation log, in megabytes, for environments
with search head clustering or search head pooling.
In a standalone environment, 20% of this size is used.
* After the KV Store has created the oplog for the first time, changing this
setting does NOT affect the size of the oplog. A full backup and restart
of the KV Store is required.
* Do not change this setting without first consulting with Splunk Support.
* Default: 1000 (1GB)
Ha @uagraw01 you caught me at a good time 😉
Sounds like RAM shouldnt really be an issue then, although it is possible to adjust how much memory mongo can use with server.conf/[kvstore]/percRAMForCache (See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf?_gl=1*homgau*_ga*NzI2Njg4NjMzLj...)
You could adjust this and see if this resolves the issue, Its 15% by default.
The other thing I was wondering is if there are any high memory operations against KVStore being done when it crashes that might be causing more-than-usual memory usage? Are you using DB Connect on the server, or are any certain modular inputs executing at the time of the issue?
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Will
Hi @uagraw01
It sounds like your Splunk server is running out of RAM.
Please could you confirm how much RAM your server has, you could run the following and let us know what is returned?
index=_introspection host=YourHostname component=HostWide earliest=-60m
| dedup data.instance_guid
| table data.mem*
and
| rest /services/server/info splunk_server=local
| table guid host physicalMemoryMB
Also, have you recently added a large number of KV Store objects which might have caused the memory usage to grow quickly?
I think the below query should show how big the KV Store is, please let us know what you get back
| rest /services/server/introspection/kvstore/collectionstats
| mvexpand data
| spath input=data
| rex field=ns "(?<App>.*)\.(?<Collection>.*)"
| eval dbsize=round(size/1024/1024, 2)
| eval indexsize=round(totalIndexSize/1024/1024, 2)
| stats first(count) AS "Number of Objects" first(nindexes) AS Accelerations first(indexsize) AS "Acceleration Size (MB)" first(dbsize) AS "Collection Size (MB)" by App, Collection
It could be that you either need to increase RAM to accommodate the demand on the server.
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Hey Will, @livehybrid, you’re even faster than GPT! 😄
We've already upgraded our RAM from 32GB to 64GB.
I see that you are running splunk on windows?
I haven’t so much experience how window’s internals works in current versions, but are you sure that splunk can use all that added memory without additional configuration? E.g. in Linux you must run at least disable boot-start and re-enable it again. Otherwise systemd didn’t know that splunk is allowed to use that additional memory.
Ha @uagraw01 you caught me at a good time 😉
Sounds like RAM shouldnt really be an issue then, although it is possible to adjust how much memory mongo can use with server.conf/[kvstore]/percRAMForCache (See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf?_gl=1*homgau*_ga*NzI2Njg4NjMzLj...)
You could adjust this and see if this resolves the issue, Its 15% by default.
The other thing I was wondering is if there are any high memory operations against KVStore being done when it crashes that might be causing more-than-usual memory usage? Are you using DB Connect on the server, or are any certain modular inputs executing at the time of the issue?
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will