Splunk Dev

What's the maxSize we can set for the event-processing queues?

ddrillic
Ultra Champion

On the indexers we have 64 GBs of RAM.

We have the following configurations -

[queue=AEQ]
maxSize = 200MB

[queue=parsingQueue]
maxSize = 3600MB

[queue=indexQueue]
maxSize = 4000MB

[queue=typingQueue]
maxSize = 2100MB

[queue=aggQueue]
maxSize = 3500MB

So, the processing queues can consume altogether up to 13.4 GBs and currently we are at 100% for all the queues. We wonder how high we can set them up while leaving enough RAM for the Splunk processes.

The servers are fully dedicated to Splunk...

0 Karma

davebo1896
Communicator

Make your parsing more efficient by explicitly setting timestamp and linemerge in the props.conf on the indexers
example:
TIME_FORMAT = %d.%m.%Y %H:%M:%S.%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 23
LINE_BREAKER = ([\r\n]+)(?:\d{2}.\d{2}.\d{4}\s\d{2}:\d{2}:\d{2}.\d{3}\s*\w+*)
SHOULD_LINEMERGE = false

0 Karma

sjalexander
Path Finder

That you're at 100% for all queues suggests a fundamental problem with your architecture. It implies that you're not able to write to storage as fast as data is coming in.

You need either faster storage, or better distribution of your inputs, but not bigger queues. Queues are best used for dealing with unpredictable ingestion rates (they can handle volume spikes for you), but they cannot help you if your overall rate is overwhelming your throughput capacity.

0 Karma

ddrillic
Ultra Champion

-- ... best used for dealing with unpredictable ingestion rates...

That's pretty much it but really, it's not unpredictable ingestion rates as the ingestion rates vary greatly throughout the business day. Increasing the queues has been helping out in the past year or so, to handle the peak usage time.

So, the indexers have 64 GBs of RAM and the queues, at the moment, are up to 13.4 GBs. How high can they be?

0 Karma

sjalexander
Path Finder

I'm declining to answer your specific question on account of I think it's the wrong question to be asking in this case. You should really be looking at balancing that load across more indexers.

0 Karma

ddrillic
Ultra Champion

no worries ; -)

0 Karma

jtacy
Builder

Just curious: is this based on a lab or production environment? Your queue size and fill ratio implies indexing latency of several minutes which I would already consider excessive. How much incoming data is each indexer handling and what problem are you trying to solve with more queue?

0 Karma

ddrillic
Ultra Champion

-- indexing latency of several minutes

This is just fine. We can handle indexing latency of several minutes - nobody will get hurt ...

We "just" want to pass the peak usage time safely.

0 Karma

jtacy
Builder

I'm intrigued by your environment! Seems safe to say that you're getting your money's worth out of your servers 🙂

I would just point out that I've seen apps run indexers completely out of memory; I'm guessing you aren't using useACK at these volumes so I'd be concerned about potential data loss. I was also going to comment that you're sacrificing your file system cache for queue, but you have so much churn I wonder how long you can keep the cache around.

If you're prioritizing indexing over search performance (considering that the latter benefits from large vfs cache), why not go to a nice round number like 50% of system memory, or 32 GB? We run default queue sizes and the largest splunkd indexer process I see at the moment uses less than 2 GB of physical memory. Most of these indexers have only 32 GB of total memory and they're solid. If something's going to burn you I think it will be a runaway search and/or excessive search concurrency, not the indexing process itself.

Thanks for bringing up this thought-provoking question!!

0 Karma

ddrillic
Ultra Champion

Interesting, so you are saying that the 13.4 GBs for the queues, can grow all the way to 32 GBs! wow. I wonder if any of this is documented... meaning, the proper usage of memory on the indexers.

Right we don't use useACK as didn't want to add to the load are we are not that worried truly about data loss, at least for now.

We simply can't write fast enough to disk at peak usage time -

alt text

I'm thinking about doubling the size of the index queue to be of 8 GBs. Not sure about the proportions across the queues...

0 Karma

jtacy
Builder

Oh yeah, on 64-bit I have no reason to believe that you're going to be arbitrarily limited on queue size. In my mind, there are three good reasons to keep large amounts of free memory:

  1. Headroom for memory-intensive searches.
  2. Headroom for high search concurrency.
  3. Potential for large vfs cache which should help the performance of repeated searches against the same data set.

32 GB is plenty of system memory to leave for Splunk itself; again, most of our indexers have that much total memory. Consider page 46 of the following presentation where memory utilization is measured during high-load indexing and search; Splunk indexing just doesn't seem to take a lot of memory:
https://conf.splunk.com/files/2016/slides/harnessing-performance-and-scalability-with-parallelizatio...

I probably sound like a broken record, but under "normal" conditions the best place for your system memory is the vfs cache. Consider this presentation starting at 29:30 for a discussion of how total system memory affects IOPS in a production environment:
http://conf.splunk.com/files/2016/recordings/it-seemed-like-a-good-idea-at-the-time-architectural-an...

If you're certain that you're I/O-bound, you have a non-zero search workload, and you're giving all your memory to queues, you might end up hitting your storage even harder than you are already (requiring more queue and ultimately not solving any problems). You're certain that you're not CPU-bound and wouldn't benefit from additional indexing pipelines?

This entire conversation assumes that there's a technical reason that you can't just let your forwarders block. Is your indexing latency that much worse if you just leave the default queue size in place? Is there data on the forwarders that will be lost if you don't forward it fast enough? Still fascinated by your situation 🙂

Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...