Why are indexing queues maxing out after upgrading...

johnpof · ‎10-04-2016

I have four indexers in a round robin, all were working great. After upgrading my entire environment to 6.5.0, all my nodes work just fine aside from my newest indexer which has its queues filling up even at 5kb/s indexing rate.

I've tried everything including upping the parallel indexing pipes to 2/4/8 but it can't take any data without falling over immediately and it just stops indexing which causes all sorts of problems.

I see no errors, disks are healthy, CPU/Memory is very low and the Splunk health checks don't show anything concerning.

This happened immediately after the upgrade, what could cause this?

Thanks.

davebo1896 · ‎06-07-2017

The performance degradation we experienced (queues backing up) with 6.5.x had to be addressed in two ways

1) better disk i/o - we are running on VMs with a SAN backend, we dedicated hot/warm buckets to flash
2) datestamp parsing - every sourcetype has to have explicit LINE_BREAKER defined in props.conf on the indexers and heavy forwarders:

[my_sourcetype]
# example time: 2017-06-05T21:10:49.519+0000
TIME_FORMAT = %Y-%m-%dT%H:%M:%S\.%3N%z
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)(?:\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{3}(\+\-)\d{4}\s)
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 28
ANNOTATE_PUNCT = false

baseballnut8200 · ‎06-02-2017

We seem to be having a similar issue with 6.5.3.1. We were on 6.4.6, and everything was working fine. We upgraded to 6.5.3.1, and indexing queue is filling up to 100%, causing all other queues to fill up to the point where indexing stops completely. Restart of Splunk clears the issue, but only temporarily. We are not pulling in more data than before, but we have the problems now. We have a mix of HF and UF forwarding to indexers, and memory and CPU are fine. Any thoughts?

baseballnut8200 · ‎06-07-2017

Ptoro...

Thanks for the info. As soon as we turned off the LDAP connections on the indexers in our environment, all the queues calmed right down. So, this was certainly the fix for us. Really appreciate your info...

ptoro · ‎06-03-2017

I suggest just turning off any external auth on the indexers (LDAP). That was our main issue and confirmed bug in 6.5.x (at least to me while working with support). In my case the indexers were constantly trying to auth and slowing the indexing process (causing blocks).

Hopefully you are doing some sort of external auth on indexers and this "fixes" your issue.

johnpof · ‎02-16-2017

I'm on 6.5.2 now, it's fixed but I'm not sure if it was hardware or software at this point.

My cold storage was degraded but everything gets ingested into my warm storage

aaraneta_splunk · ‎02-17-2017

Hi @johnprof - Did your answer provide a working solution to your question? If yes and you would like to close out your post, don't forget to click "Accept". But if you'd like to keep it open for possibilities of other answers, you don't have to take action on it yet. Thanks!

ptoro · ‎12-04-2016

Hey John,

Were you able to get anywhere with support? I had the same issue with 8.4 to 6.5 upgrade where it would "fall over" like you said. I went back to 6.4 and of course everything was fine and have been going back and fourth with support to try to figure out why indexer congestion was occurring.

Don't get me wrong, sometimes the queues fill up in 6.4 but immediately empty; not the case in 6.5 where they would/could stay full for minutes on end.

support wants me to upgrade to 6.5 to "see" if it happens again after changing some settings in current env (6.4, we made parallel processing = 2). Not sure how many used have indexer clusters but it's hard to believe that just you and me are having these issues in 6.5...

Lucas_K · ‎10-04-2016

"falling over immediately"

What does that specifically mean?

Unsearchable?

Are you running clustering or standalone indexers?

Which queue is filling? parsing, aggregation, typing, indexing?

johnpof · ‎10-04-2016

Falls over as in it completely fills up every single queue ( parsing/agg/typing/indexing ) then stops indexing data completely and the data becomes unsearchable.

I'm running four stand alone indexers which are being sent data via round robin DNS.

esix_splunk · ‎10-04-2016

I'd open a support case and work with them to troubleshoot this. Perhaps you have encountered a bug...

johnpof · ‎10-05-2016

Yeah that was my determination as well, I opened one up thanks.

bgadsk · ‎01-22-2017

any resolution to the problem? I'm facing a similar issue

ptoro · ‎01-23-2017

GOOD LUCK

I'm on my second support ticket about this SAME issue. I'm given the same old story that 6.5 is so much more resource intensive...

I even went to the latest build on 6.4 (6.4.5) and my index cluster is FINE. I try to upgrade to 6.5 and it immediately falls over.

IF ANYONE has ANY ideas i'm ALL EARS.

(please excuse the CAP usage, just frustrated)

jofe2 · ‎02-15-2017

Do you still have this issue?

Is it the same with 6.5.2?

ptoro · ‎02-15-2017

Going to try 6.5.2 soon. I've had a handful of indexqueue blocking in the last 24 hours (on avg) on my current install (6.4.5). MIND YOU that I have more than one queue in parallel (parallelIngestionPipelines > 1).

Support says that if you have ANY blocked queues is the reason 6.5.x is falling over. So let that sink in...

6.4.x might have a couple of blocked queues in a day; on individual indexers and on individual pipelines and not at the same time on indexers and pipelines and it doesn't fall over. 6.5.x has so much more demand on IO that it falls over... immediately... While their documentation doesn't mention additional IO increase when going to 6.5 and only talks about increased memory usage.

You would think that If I increase parallelIngestionPipelines which by definition increases IO on the backend would show any IO issues and would thus make the queues block a whole lot more.

If feels more like a cop out to just say it's hardware when it's fine in 6.4 and immediately falls over in 6.5. Right now i'm increasing maxsize of the indexqueue just to show them a days worth of no index blocks... It just seems weird is all that.

Who's to say it's NOT software.

Why are indexing queues maxing out after upgrading to Splunk 6.5.0?

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM