Hello Team,
Pre staging environment (not production), a single server with 12 CPU + 24 GB or memory + raid0 nvme (2.5GB/s write, 5GB/s read). All in one deployment (SH + indexer). CPU cores with HT on dedicated server (6 cores with HT = 12 CPU -> but not used by any other VM).
Splunk 9.1.1 and ES 7.1.1. Fresh install. NO data ingested (0 events in most of the indexes including main, notable, risk etc...) - so basically no data yet to be processed.
Default ES configuration, i have not yet tuned any correlation searches etc. Defaults. And already performance problems:
1. MC Scheduler Activity Instance showing 22% skipped.
2. ESX reporting minimal CPU usage (the same with memory):
3. MC showing more details, many different Accelerated DM tasks are skipped, all the time:
Questions:
1. obviously the first recommendation would be to disable many of correlation searches/accelerated DMs, but that not what i would like do because the aim is to test complete ES functionality (by generating a small number of different types of events). Why do i have those problems in a first place ?
I can see that all the tasks are very short, finishes in 1 second, just few takes several seconds. And that is expected since i have 0 events everywhere and i do always expect to have a small number of events on this test deployment. What should i do to tune it and make sure there are no problems with skipped jobs ?
Shall i increase
max_searches_per_cpu
base_max_searches
Any other ideas ? Overall that seems weird,
Hello @MichalG1, ES requires 16 CPU, 32 GB Memory (https://docs.splunk.com/Documentation/ES/7.2.0/Install/DeploymentPlanning). However, if the ask is to update max_searches_per_cpu and base_max_searches on pre-prod environment (and not prod), you can go ahead and try doing that.
I would also suggest disabling the Data Model Accelerations, as well as, reviewing the correlation searches which are enabled by default - because the issue seems to be with the scheduler getting a lot of searches to execute at any given time (and not resources issue). You can also review the alert actions and corn schedules, through this search (and stagger cron schedule if needed) -
| rest splunk_server=local count=0 /servicesNS/-/SplunkEnterpriseSecuritySuite/saved/searches
| where match('action.correlationsearch.enabled', "1|[Tt]|[Tt][Rr][Uu][Ee]")
| where disabled=0
| eval actions=split(actions, ",")
| rename title as "Correlation Search", cron_schedule as "Cron Schedule" "dispatch.earliest_time" as "Earliest Time" dispatch.latest_time as "Latest Time" actions as "Actions"
| table "Correlation Search" "Cron Schedule" "Earliest Time" "Latest Time" "Actions"
Please accept the solution and hit Karma, if this helps!