Solved: Re: Specific Indexer is overused!!

dongwonn · ‎04-18-2024

HI, I'm working in splunk team.

Environment:

3 SH 10 IDX (1 of 10 IDX overused)

Replication factor 3

Search factor 3

Could it happen that searches are continuously done only on certain indexer? I've been constantly monitoring them with top and ps -ef, and I'm seeing a lot of search operations on certain indexer. The cpu usage is roughly double... It's been going on for months. Can it be considered normal?

PickleRick · ‎04-19-2024

Wait a second.

9 out of 10 indexers have roughly the same number of buckets and 1 has just 1/3 of those?

And this one has significantly larger buckets?

That is strange.

With ingestion imbalance as a primary factor you should have one or a few indexers with bigger bucket count, not smaller.

If you have larger buckets, I'd hazard a guess that:

1) You have primary buckets on that indexer (so you have some imbalance if this indexer receives all the primaries there)

2) The summaries are generated on that indexer (hence the increased size)

3) The summaries are not replicated between peers (if I remember correctly, replicating summaries must be explicitly enabled)

So your indexer is overused because it has all the primaries and all summary-generating searches hit just this indexer. And probably due to size of the index(es) or the volume(s) your buckets might get frozen earlier than on other indexers.

View solution in original post

inventsekar · ‎04-18-2024

Hi @dongwonn

Maybe more details pls..

1) on Monitoring Console, do you see any errors / warnings

2) on the indexer clustering, do you see the buckets imbalance?

3) may we know how you say -- only 1 indexer out of 10 is overused.

4) any recent changes to the indexer cluster, .. any upgrades/migrations, any new apps deployed.. etc..

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !

dongwonn · ‎04-18-2024

Hi @inventsekar

Thank you for answer!

1) I don't see any warnings in MC.

2) I see only 1 indexer's bucket count is about 50,000. 9 indexer's count is about 140,000 ~150,000. And each bucket size in 1 indexer is three times bigger than other indexers. So I checked bucket in terminal, i found that tsidx file's sizes are large.

3) Every indexer's conf is same. This trouble continues a few months.

Is there anything else to check?

PickleRick · ‎04-19-2024

Wait a second.

9 out of 10 indexers have roughly the same number of buckets and 1 has just 1/3 of those?

And this one has significantly larger buckets?

That is strange.

With ingestion imbalance as a primary factor you should have one or a few indexers with bigger bucket count, not smaller.

If you have larger buckets, I'd hazard a guess that:

1) You have primary buckets on that indexer (so you have some imbalance if this indexer receives all the primaries there)

2) The summaries are generated on that indexer (hence the increased size)

3) The summaries are not replicated between peers (if I remember correctly, replicating summaries must be explicitly enabled)

So your indexer is overused because it has all the primaries and all summary-generating searches hit just this indexer. And probably due to size of the index(es) or the volume(s) your buckets might get frozen earlier than on other indexers.

dongwonn · ‎04-20-2024

Hi! @PickleRick Thank for Answer.

I didn't know about primary, non-primary searchable copy terms until you said.

In our operation environment, summary is rarely used.

So, I think we need to collect information about primary copy and find the cause.

Thank you again!

PickleRick · ‎04-20-2024

You might not be explicitly using summaries but it's quite probable that you're using datamodel acceleration. And that's nothing other than summaries built on datamodel contents for given indexes.

You can read some basic info on summary replication here https://conf.splunk.com/files/2016/slides/replication-of-summary-data-in-indexer-cluster.pdf

bowesmana · ‎04-18-2024

Run this command to see if you have poor data ingestion balance across the indexers

| tstats count where index=* by index splunk_server
| stats sum(count) as total dc(splunk_server) as dc_splunk_server by index

The dc_splunk_server field will show you how many indexers contain the data for a particular index. If you sort by count, check if the largest data counts are across all indexers.

You can also go a bit deeper to check the min/max/avg data count per indexer/index and see if the min or max are outside 3*stdev from average. Also checks if the data is not across all indexers.

| tstats count where index=* by index splunk_server
| stats avg(count) as avg_count min(count) as min_count max(count) as max_count stdev(count) as stdev_count dc(splunk_server) as dc_splunk_server by index
| eventstats max(dc_splunk_server) as total_splunk_servers
| where dc_splunk_server < total_splunk_servers OR (min_count < (avg_count - 3*stdev_count)) OR (max_count > (avg_count + 3*stdev_count))

dongwonn · ‎04-20-2024

Hi @bowesmana Thank for Answer!

I checked balance as SPL that you gave to me. Balance looks like not bad.

I confirmed that the major indexes are counted as many as the number of indexers.

isoutamo · ‎04-19-2024

Hi
Here is excellent presentation about event distribution
“ Best practises for Data Collection - Richard Morgan”.
You could found it at least from slide share service.
r. Ismo

Specific Indexer is overused!!

indexer

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Specific Indexer is overused!!

indexer

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I