We're seeing massive memory use (20GB+) of the Network_Traffic datamodel acceleration searches.
The limits.conf default max_mem_usage_mb is set to 200 but the tstats search doesn't seem to listen to this. The searches seem to continue for about 62 minutes even though the max_time is set to 3600. Linux often kills the processes for running out of memory (OOM killer).
The splunk version is 7.2.6. and we're using a search peer cluster, I don't see any different max_mem_usage_mb settings on the indexers/search head. How do we ensure that the acceleration searches run fine but don't take 20GB+ memory?
max_mem_usage_mb is only used to cache events and result set for a particular search; it is not used to limit the 'splunkd' memory usage. Any result set that is larger than 200MB (default) will spill to disks -- you must be also seeing high disk IO activities and a ton of files that look like /statstmp_partition0_1555718872.35.srs.zst. There is a good chance that a big portion of that 62 minutes is spent in disk io. Your Network_Traffic must be ingesting 1) a huge volume of events and 2) events have high cardinality -- this, by nature, makes acceleration expensive (high memory and CPU cost).
There are a few things you can try to speed things up:
Use index scoping. By default, the data model will look at all indexes that contain a tag (e.g., pci), if by chance, other indexes (that you are not really interested in accelerating as part of Network_Traffic) have the same tag, naturally they will be scanned for acceleration. So if you just scope the data model to the particular index, you have control. Chances are acceleration will be much faster. See screen attached showing where to define index scoping.
Yes, you can try increasing max_mem_usage_mb. On a host with 64GB RAM, 1GB is just 1.5%, a good starting point to try. By spilling less temp results to disks, the memory overhead can also be reduced, possibly lower the total memory usage.