Re: kvstore issue

uagraw01 · ‎03-08-2025

Hello Splunkers!!

We are experiencing frequent KV Store crashes, which are causing all reports to stop functioning. The error message observed is:

"[ReplBatcher] out of memory."

This issue is significantly impacting our operations, as many critical reports rely on KV Store for data retrieval and processing. Please help me to get it fix.

Thanks in advance!!

kiran_panchavat · ‎03-08-2025

@uagraw01

Even with 64GB, an excessively large or poorly managed KV Store dataset could overwhelm mongod

Check the KV Store data size: du -sh /opt/splunk/var/lib/splunk/kvstore/mongo/

Look in collections.conf across apps ($SPLUNK_HOME/etc/apps/*/local/) to identify what’s stored.

Query collection sizes via Splunk REST API:

| rest /servicesNS/-/-/storage/collections/data/<collection_name> | stats count

https://dev.splunk.com/enterprise/docs/developapps/manageknowledge/kvstore/usetherestapitomanagekv/

Is the KV Store directory getting too large (e.g., 20GB+)?

Any single collection with millions of records or huge documents?

If a collection is oversized, archive or purge old data (e.g., ./splunk clean kvstore --collection <name> after backing up).

Optimize apps to store less in KV Store (e.g., reduce field counts or batch updates).

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

kiran_panchavat · ‎03-08-2025

@uagraw01

Is mongod using a small fraction of the 64GB (e.g., stuck at 4GB or 8GB) before crashing?
Any ulimit restrictions?

If capped, increase the ulimit (e.g., edit /etc/security/limits.conf to set splunk - memlock unlimited and reboot or reapply). MongoDB (used by KV Store) typically uses up to 50% of system RAM minus 1GB for its working set by default. With 64GB, it should have ~31GB available, ensure it’s not artificially limited.

Open $SPLUNK_HOME/var/log/splunk/mongod.log and look for the [ReplBatcher] out of memory error. Note the timestamp and surrounding lines.
Cross-check $SPLUNK_HOME/var/log/splunk/splunkd.log for KV Store restart attempts or related errors.

The [ReplBatcher] component handles replication in KV Store, and an "out of memory" error here suggests it’s choking on the replication workload. With 64GB, it shouldn’t be a hardware limit, so tune the configuration.

Check server.conf ($SPLUNK_HOME/etc/system/local/server.conf)

[kvstore]
oplogSize = <current value>

 oplogSize = <integer>
* The size of the replication operation log, in megabytes, for environments
with search head clustering or search head pooling.
In a standalone environment, 20% of this size is used.
* After the KV Store has created the oplog for the first time, changing this
setting does NOT affect the size of the oplog. A full backup and restart
of the KV Store is required.
* Do not change this setting without first consulting with Splunk Support.
* Default: 1000 (1GB)

Default is 1000 MB (1GB). Post-RAM upgrade, this might be too small for your data throughput.

https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf?_gl=1*homgau*_ga*NzI2Njg4NjMzLj...

Run ./splunk show kvstore-status to see replication lag or errors.

Restart Splunk (./splunk restart) and monitor if crashes decrease. A larger oplog gives replication more buffer space, reducing memory pressure.

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

livehybrid · ‎03-08-2025

Ha @uagraw01 you caught me at a good time 😉

Sounds like RAM shouldnt really be an issue then, although it is possible to adjust how much memory mongo can use with server.conf/[kvstore]/percRAMForCache (See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf?_gl=1*homgau*_ga*NzI2Njg4NjMzLj...)

You could adjust this and see if this resolves the issue, Its 15% by default.

The other thing I was wondering is if there are any high memory operations against KVStore being done when it crashes that might be causing more-than-usual memory usage? Are you using DB Connect on the server, or are any certain modular inputs executing at the time of the issue?

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.

Will

kiran_panchavat · ‎03-08-2025

@uagraw01

Upgrading from 32GB to 64GB RAM means memory is no longer the main issue. But since the [ReplBatcher] out of memory error is still happening, the problem is likely elsewhere.

Check mongod memory usage during a crash: On Linux, run top or htop and sort by memory (RES column) to see how much mongod is consuming. Confirm no OS-level limits are capping it: Check ulimit -v (virtual memory) for the Splunk user. It should be unlimited or very high.

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

livehybrid · ‎03-08-2025

Hi @uagraw01

It sounds like your Splunk server is running out of RAM.

Please could you confirm how much RAM your server has, you could run the following and let us know what is returned?

index=_introspection  host=YourHostname component=HostWide earliest=-60m
| dedup data.instance_guid
|  table data.mem*

and

| rest /services/server/info splunk_server=local
| table guid host physicalMemoryMB

Also, have you recently added a large number of KV Store objects which might have caused the memory usage to grow quickly?

I think the below query should show how big the KV Store is, please let us know what you get back

| rest /services/server/introspection/kvstore/collectionstats
| mvexpand data
| spath input=data
| rex field=ns "(?<App>.*)\.(?<Collection>.*)"
| eval dbsize=round(size/1024/1024, 2)
| eval indexsize=round(totalIndexSize/1024/1024, 2)
| stats first(count) AS "Number of Objects" first(nindexes) AS Accelerations first(indexsize) AS "Acceleration Size (MB)" first(dbsize) AS "Collection Size (MB)" by App, Collection

It could be that you either need to increase RAM to accommodate the demand on the server.

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

uagraw01 · ‎03-08-2025

Hey Will, @livehybrid, you’re even faster than GPT! 😄

We've already upgraded our RAM from 32GB to 64GB.

isoutamo · ‎03-08-2025

I see that you are running splunk on windows?

I haven’t so much experience how window’s internals works in current versions, but are you sure that splunk can use all that added memory without additional configuration? E.g. in Linux you must run at least disable boot-start and re-enable it again. Otherwise systemd didn’t know that splunk is allowed to use that additional memory.

livehybrid · ‎03-08-2025

Ha @uagraw01 you caught me at a good time 😉

Sounds like RAM shouldnt really be an issue then, although it is possible to adjust how much memory mongo can use with server.conf/[kvstore]/percRAMForCache (See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf?_gl=1*homgau*_ga*NzI2Njg4NjMzLj...)

You could adjust this and see if this resolves the issue, Its 15% by default.

The other thing I was wondering is if there are any high memory operations against KVStore being done when it crashes that might be causing more-than-usual memory usage? Are you using DB Connect on the server, or are any certain modular inputs executing at the time of the issue?

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

kvstore issue

installation

other

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)

Are you a member of the Splunk Community?

kvstore issue

installation

other

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)