Discrepancies between DMC server view and actual s...

Abass42 · ‎05-20-2025

I had a few questions over how the DMC gathered its information for server specifications and how to extract them.

I am trying to add resources to our slower indexers. I am trying to follow the official resource docs here.

To begin, I am viewing my infrastructure through the DMC view:

Monitoring Console -> Settings -> General Setup

And from here, I get a view that looks like:

In this view, my server is said to have 4 Cores and 15884 MB worth of memory. These are Azure VMs, so they use vCPUs. This one specifically is an Azure VM indexer. We have latency issues with these and I believe it's due to it being under resourced.

Looking at my server specifically, I get the following CPU specs:

From looking at the other hosts, from both sources, I have reached the conclusion that cores = cores x sockets (this makes sense)

From some references I was looking at, it seemed like i needed to be looking at the CPU(s) value from the lscpu command. For this example, this server has 8 vCPU, and its recommended we have at least 12.

When upgrading, do i need to focus on the cores, or do i just need to specify how many vCPUs i need.

I also wanted to know how I could extract this view from the DMC. I want to table all of the resource metrics so I can gather to get a full picture of what I have in my Splunk environment.

Thank you for any guidance or insight.

PickleRick · ‎05-20-2025

No. You have 8 vCPUs whereas the minimum indexer specs show 24vCPUs.

Actually it's a bit more complicated than that. Since most if not all modern CPUs include hyper-threading, OS shows more "CPUs" than the die actually has cores. It's however not as easy to calculate performance of such setup since hyper-threading works pretty good with multithreaded applications but doesn't give that much of a performance boost with many single-threaded ones.

Anyway, you most probably have a virtualized 4-core 2 threads-per-core CPU which is a really low end for a production indexer. Yes, you can run Splunk lab at home on 4 cores but if you want to put any reasonable load on the box - you'll have a lot of problems. As a rule of thumb Splunk uses 4-6 vCPUs just for single pipeline indexing. So there's not much juice left in your hardware for searching. This box is really undersized.

Abass42 · ‎05-20-2025

Yes it is. I wasn't around when these were spun up, but now that it's up to me to fix it, I want to make sure we wont run into these issues for a few years. If we already have 10 Physical indexers that handle most of the data, would I need the 96 vCPU for the other 6 in Azure, I have to consider costs with this as well.

Thanks

isoutamo · ‎05-20-2025

You said that indexers are slow, but how you have made this conclusion? And did this mean that there is lack of cpu, memory or IO resources? Before you start to increase size of servers you must understand what exactly is your situation!

How much you are indexing daily
how many queries you have
how many indexers you have
which kind of topology you have
have you SmartStore in use
which kind of nodes you have
what node types you have
what storage you have
what are you metrics for cpu, mem, io
how many indexes
etc

Abass42 · ‎05-20-2025

Hey,

Thanks for your response. With this azure cluster, we receive latency alerts all of the time. Comparing the installed and available resources to the recommended, we are under resourced. We have a clustered environment, two clusters at a different location for both the Search Heads as well as the Indexers. We have 10 Physical indexers, each with about 60TB of storage and 48 cores and 96 CPUs. Compare these indexers to the Azure ones, which have 4 Cores and 8 CPUs.

Our Azure cluster has been giving us latency warning for a few years now, even after a few upgrades. Now that I am more comfortable with our environment, I want to finally upgrade the cpu and memory to the recommended values.

We have 6 Azure Indexers, all of which have latency issues at some point or another. We have about 100 indexes, our top 3 sources ingest 600GB daily, and we average about 1.7 TB a day.

To sum up, these are under resourced, and they need more CPU.

Thanks

kearaspoor · ‎05-20-2025

Hi! @isoutamo looped me in b/c he knows I'm currently in an Azure environment that's doing ~1Pb/day

First, are you using SmartStore to offload older events to blob storage? If so, around 1TB/day you're going to want to start thinking about splitting up your cluster because Azure throttles blob upload/download. That WILL cause latency problems. And there's also a whole bunch of SmartStore tuning you'll need to consider to minimize cache thrashing. If you're not using SmartStore then the math goes a completely different way.

Generally, what instance types are you using? We've evaluated the following and find them are more than capable at our scale
dasv5-series
dasv6-series
lsv3-series
ebsv5-series
edsv5-series
edsv6-series
If you ARE using SmartStore keep in mind that theres no concept of hot/cold, just local disk/remote store so some of the faster local NVME may not scale up to what you need for your local cache and going for systems that don't have that and instead can scale your attached disk for your local cache is the way to go. That was our situation which is why we chose instances that don't have local disk, but allow lots of disks to be attached.

If you AREN'T using SmartStore, then you'll want to look at the other instance types and leverage the NVME local disk for hot/warm and teh attached disk as cold.

Beyond that, it's just a matter of picking the right size of your instance types to meet your SF/RF needs and data Ingest/Search load. SmartStore/blob storage is really the piece that makes Azure unique. Let me know if you are using it and we can discuss how to go about splitting your storage account(s) and possibly splitting your cluster.

Abass42 · ‎05-20-2025

Hey,

Thank you for assisting. 1PB is incredible. I though 1.7 TB was a lot. I am not too sure about instance type, I am reaching out to see. As far as Smart Store, we aren't using anything of the sort i believe. We just have retention policy to roll over cold/frozen data. Ill look over the docs regarding the smart store.

In regard to min requirements, a lot of our administration servers, Deployment Server, some of the servers handling syslog data, DMZ heavy forwarders, Cluster Managers, etc, all have around 6-8 Cores, and roughly around 6 CPUs.

The Server pictured below/above, is of our Azure Cluster Manager. It manages a cluster, and may index itself, not sure, but it only has 8 CPU and 4 cores. Should all servers at least meet the minimum requirements? Especially with our ingestion load? I would imagine so. I can work with Support to answer any specific questions as I already have an ODS case open to handle this Splunk version upgrade. This RHEL upgrade is being pushed due to RHEL 7's support expiring.

12 physical CPU cores, or 24 vCPU at 2 GHz or greater speed per core.
12 GB RAM.
A 1 Gb Ethernet NIC, optional second NIC for a management network.

PickleRick · ‎05-20-2025

For the "auxiliary" servers (although CM is very important for cluster operations) the sizing hugely depends on a scale. You can have a TB-sized environment which still serves only a few dozens of UFs from DS so you can make this DS really small (6CPU would suffice; I've seen such environments) but you could as well have several thousands of UFs pulling from DS. Anyway, with DS you can significantly lower the server's load by increasing the polling period at the cost of increased "latency" of changes to deployed apps.

CM also grows with the size of your environment. TB/day scale is still relatively moderate so it shouldn't need 24vCPUs for that.

isoutamo · ‎05-20-2025

Based on that information your cluster sizing is not even a near the enough 😬 I’m quite sure that your environment needs something else than cpu too.

In Azure there are some other limitations and recommendations which you must handle with your current volumes. Let’s see if we can get some people who have worked more with bigger Azure splunk Installations?

Abass42 · ‎05-20-2025

I am ,

I am working alongside the unix team, as they have a better understanding of storage and resource requirements than I do. But this upgrade is long overdue.

Thanks for the assistance. I think I have enough for the required requests I am filling out.

Discrepancies between DMC server view and actual server view

capacity planning

Linux

unix

workload management

.conf25 Global Broadcast: Don’t Miss a Moment

Observe and Secure All Apps with Splunk

What's New in Splunk Observability - August 2025

Are you a member of the Splunk Community?