Lately I'm facing a big problem with my splunk searcher. A lot of my dashboard queries fails almost immediately after they start to run with "Bad allocation" error. I found one question in this forum about the same problem, and the the answer was to add more RAM. I've done it dome it didn't seems to work.
So here are some details:
I'm using 1 indexer server, that is also used as searcher. The server is indexing about 150g per day.
Windows server 2012
500g storage (350 used, about 10 indexes)
Whenever I run any dashboard (even splunk's default dashboards) I'm facing "bad allocation" error.
When I tried reading the search.log I found that the searches failed in less than a second, and that there are at least four different sets of error logs. The most common one is this (took only the part that contains errors):
[Info] Database directory manager::bucket - use booomfilter = true
[Error] stmgr -dir='D:.....\db\hot_v1_5761' st_query failed rc=-2 warm_rc=[0,0] query [1510686487,1510688868,[ AND myfield sourcetype::maingws ]] is_exact=false
Info batchsearch - recategorizing myapp~5760~xxxxx~xxxxx~xxxxx~xxxxx~xxxxx as non-restartable for responsiveness.
This log lines are then printed several times, every time with different db directory, and then the search is shut down, and the final log is the "bad allocation".
I checked the server resourced when running the dashboard and the RAM looked fine (about 60%), the CPU peeked from 30% to 100% for a few moments and then returned to normal, but the network and diskIO looked really bad (both was about 100% for a few minutes).
Do you have any idea how to overcome this problem? What more can I check?
Would you recommend to split my data to another disk to decrease diskIO usage?
If I'll use a search head instead of using my indexer for searching, would it solve this?
Do you recall what your paging (page) file or virtual memory setting are set to? I'm having a similar issue to what you are describing and I'm wondering if this could be an issue with the available RAM and virtual memory for memory-intensive searches.
Found out that about 80% out the memory splunk consumed on my machine was used for rested input. When I have disabled all my rest requests the "bad allocation" error was gone, meaning it was after all a memory problem. The problem was that even though my server is pretty strong, splunk is only using about 50% of its memory. 60% tops. I modified some values in the limits.conf file, and now it look like my splunk server is using much more of the memory, so even with the rested input no bad allocation errors are returned.
Ok. For most of my dashboard the settings I've changed did the work. But still, for some of the dashboard that contains lots of panels the error is still there.
This are the changes I've made in the limits.conf:
Search_process_mode = auto
Enable_memory_tracker = true
Search_process_memory_usage_percentage_threshold = 80
Batch_search_max_pipeline = 2
The changes did made most of the dashboard's searches better. I've tried tweaking the values to make all of my dashboards better, but for it didn't worked.