Solved: Can anyone help me with my sizing and performance ...

andrewtrobec · ‎10-11-2019

Hello!

I have a small distributed deployment consisting of 2 search heads (16 cores each) and 2 indexers (24 cores each). There are about 900 saved searches to govern critical alerting with the addition of dashboards containing 50 indicators that refresh every 5 minutes when users are connected. The indexers from south side need to index near realtime data while up north they're serving the alerts and end users. I have an accelerated data model and a "master" saved search that updates every 5 minutes. My questions:

How many searches can the deployment handle in parallel? My assumption is 48 since the indexer is responsible for running the searches.
Since hundreds of the alerts call scripts to carry out actions and these scripts generate logs which themselves are indexed by Splunk, would it be better for those scripts and logs to be located on the indexers or search heads? I know it would be better to have a separate box for that, but at this point it's not possible...
Is the snapshot for the "master" saved search stored on the search heads? Assuming yes, when the alerts and dashboards that are based on it run, does this in some way affect the indexers?

Thanks in advance for any input!

Regards,

Andrew

ololdach · ‎10-11-2019

Hi Andrew,
we are trespassing into the realm of dark arts, here. Performance tuning and sizing is never straight forward and at this point I only provide my personal advice and experience that may or may not apply to your installation. I usually do this as a paid exercise and for an installation like your's it usually takes a couple of days to tune it. Please note that I can not take any responsibility for the outcome.

The number of searches is the number of CPU Cores in the Search Head+some base value-Default:6. In your environment the number of parallel searches is 16+6=22 per Search Head. The Search Head is the limiting factor on searching, not the indexer. From a first glance, I would actually swap machines: The indexers are I/O bound and 16 Cores are quite sufficient to index a lot of data. On the other hand, upgrading the SH to 24 Cores + 6 Base would mean you can have 30 searches in parallel as opposed to 22 now. If you heavily rely on real time alerts (1 CPU Core blocked for each) you should strongly consider to upgrade cores on the search heads.
Alerts are being scheduled and executed on the search head. The scripts are called on the search head. In a standard deployment, your Search Heads are configured to send all their "indexing data" to the indexing layer and the most efficient way is to have some local input stanza on the Search Head to reap the logs and act as a forwarder to send them off to your indexers. Every triggered alert script starts another external process on the machine. If you have hundreds of those... I would really like to stress point 1: Get as many Cores into your SH as possible to cope with the process load on OS level.
Since you have only two SH, you do not have a search head cluster. In that case, the "Master Data", that is stored on the SH will not get replicated to the other. I would balance the load between the two so that some master search runs on A and some other master runs on B. Every search initially affects the indexers, as they have to come up with the data. Once you access a report with loadjob https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Loadjob you retrieve the results of the last report generated and the search head will not bother asking the indexers for new data.

Hope it helps
Oliver

View solution in original post

ololdach · ‎10-11-2019

Hi Andrew,
we are trespassing into the realm of dark arts, here. Performance tuning and sizing is never straight forward and at this point I only provide my personal advice and experience that may or may not apply to your installation. I usually do this as a paid exercise and for an installation like your's it usually takes a couple of days to tune it. Please note that I can not take any responsibility for the outcome.

The number of searches is the number of CPU Cores in the Search Head+some base value-Default:6. In your environment the number of parallel searches is 16+6=22 per Search Head. The Search Head is the limiting factor on searching, not the indexer. From a first glance, I would actually swap machines: The indexers are I/O bound and 16 Cores are quite sufficient to index a lot of data. On the other hand, upgrading the SH to 24 Cores + 6 Base would mean you can have 30 searches in parallel as opposed to 22 now. If you heavily rely on real time alerts (1 CPU Core blocked for each) you should strongly consider to upgrade cores on the search heads.
Alerts are being scheduled and executed on the search head. The scripts are called on the search head. In a standard deployment, your Search Heads are configured to send all their "indexing data" to the indexing layer and the most efficient way is to have some local input stanza on the Search Head to reap the logs and act as a forwarder to send them off to your indexers. Every triggered alert script starts another external process on the machine. If you have hundreds of those... I would really like to stress point 1: Get as many Cores into your SH as possible to cope with the process load on OS level.
Since you have only two SH, you do not have a search head cluster. In that case, the "Master Data", that is stored on the SH will not get replicated to the other. I would balance the load between the two so that some master search runs on A and some other master runs on B. Every search initially affects the indexers, as they have to come up with the data. Once you access a report with loadjob https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Loadjob you retrieve the results of the last report generated and the search head will not bother asking the indexers for new data.

Hope it helps
Oliver

andrewtrobec · ‎10-11-2019

Thanks so much Oliver! I'll be sure to hold you responsible! Just kidding, really appreciate you taking the time, and I hope this response can get some good views.

Can anyone help me with my sizing and performance considerations?

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Are you a member of the Splunk Community?

Can anyone help me with my sizing and performance considerations?

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...