I currently have the following setup.
3 x search heads ( 8 cpu, 16gb memory)
2 x indexer ( 8 cpu, 16gb)
Currently I'm only indexing around 10GB per day worth of data, 80% is from the NetApp application "Splunk App for NetApp Data ONTAP". I have datamodel acceleration enabled with a summary of 1 month history on a cron of every 5 minutes.
Now currently the datamodel acceleration runs for about 2-3 minutes and during that time, the memory usage of the splunkd process reaches 16gb and causing OOM kernel errors that kills the process. This causes splunk to crash on the indexer. I've tried the suggestion if implementing cgconfig rules that limits the splunk user to 12gb maximum memory usage but I find this to be a workaround at best that killing splunk child processes shouldn't be needed.
To see how much memory it could use, I created a 3rd indexer with double the resources of the original 2 (so 16 cpu and 32gb memory). In this case, when the datamodel acceleration job was running it was using 32GB and causing OOM errors to appear in /var/log/messages.
My questions:
The only thing I can think of right now is creating a custom datamodel with the fields that I need. If anyone has any solutions to try other than a new datamodel, I'm all ears.
To answer your questions. I'm running v.6.3.2 on Amazon linux instances. I had already prefixed the datamodel searches with index=ontap.
I believe I found the issue however (my own fault).
I'm using ansible to push out the apps to the SHC and master-apps folders. Due to a mistake in the ansible deployment, I pushed out the same splunk_app_netapp app to both the SHC and indexing cluster. This meant that the datamodel.conf files were also being executed on both indexes and the SHC resulting in duplicate calls. Removing this app from the indexing cluster machines solved the issue.
The first time the datamodel was built, the memory size did hit 20GB but now its quiet small and manageable.
To answer your questions. I'm running v.6.3.2 on Amazon linux instances. I had already prefixed the datamodel searches with index=ontap.
I believe I found the issue however (my own fault).
I'm using ansible to push out the apps to the SHC and master-apps folders. Due to a mistake in the ansible deployment, I pushed out the same splunk_app_netapp app to both the SHC and indexing cluster. This meant that the datamodel.conf files were also being executed on both indexes and the SHC resulting in duplicate calls. Removing this app from the indexing cluster machines solved the issue.
The first time the datamodel was built, the memory size did hit 20GB but now its quiet small and manageable.
We've seen memory issue on Windows platform. But, the app is not supporting on Windows. So, we haven't investigated cause of the issue so far.
Here is my brain storming 🙂
Assuming the system is not Windows, reducing size of each buckets might help. But, that won't fix already indexed data. If sub-search is using up memory and main-splunkd is running the search, upgrading to v6.3.2 might help resolving the issue. If the cause of the issue is not ontap bucket, specifying the search only to index=ontap may help.
Can you please file a Support case for further troubleshooting?