Getting Data In

Specific Indexer is overused!!

dongwonn
Explorer

HI, I'm working in splunk team.

Environment:

3 SH 10 IDX (1 of 10 IDX overused)

Replication factor 3

Search factor 3

 

Could it happen that searches are continuously done only on certain indexer? I've been constantly monitoring them with top and ps -ef, and I'm seeing a lot of search operations on certain indexer. The cpu usage is roughly double... It's been going on for months. Can it be considered normal?

Labels (1)
0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @dongwonn 

Maybe more details pls.. 

1) on Monitoring Console, do you see any errors / warnings

2) on the indexer clustering, do you see the buckets imbalance?

3) may we know how you say -- only 1 indexer out of 10 is overused. 

4) any recent changes to the indexer cluster, .. any upgrades/migrations, any new apps deployed.. etc.. 

dongwonn
Explorer

Hi @inventsekar 

Thank you for answer!

1) I don't see any warnings in MC.

2) I see only 1 indexer's bucket count is about 50,000. 9 indexer's count is about 140,000 ~150,000. And each bucket size in 1 indexer is three times bigger than other indexers. So I checked bucket in terminal, i found that tsidx file's sizes are large.

3) Every indexer's conf is same. This trouble continues a few months.

Is there anything else to check?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Wait a second.

9 out of 10 indexers have roughly the same number of buckets and 1 has just 1/3 of those?

And this one has significantly larger buckets?

That is strange.

With ingestion imbalance as a primary factor you should have one or a few indexers with bigger bucket count, not smaller.

If you have larger buckets, I'd hazard a guess that:

1) You have primary buckets on that indexer (so you have some imbalance if this indexer receives all the primaries there)

2) The summaries are generated on that indexer (hence the increased size)

3) The summaries are not replicated between peers (if I remember correctly, replicating summaries must be explicitly enabled)

So your indexer is overused because it has all the primaries and all summary-generating searches hit just this indexer. And probably due to size of the index(es) or the volume(s) your buckets might get frozen earlier than on other indexers.

dongwonn
Explorer

Hi! @PickleRick Thank for Answer.

I didn't know about primary, non-primary searchable copy terms until you said.

In our operation environment, summary is rarely used.

So, I think we need to collect information about primary copy and find the cause.

Thank you again!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You might not be explicitly using summaries but it's quite probable that you're using datamodel acceleration. And that's nothing other than summaries built on datamodel contents for given indexes.

You can read some basic info on summary replication here https://conf.splunk.com/files/2016/slides/replication-of-summary-data-in-indexer-cluster.pdf

bowesmana
SplunkTrust
SplunkTrust

Run this command to see if you have poor data ingestion balance across the indexers

| tstats count where index=* by index splunk_server
| stats sum(count) as total dc(splunk_server) as dc_splunk_server by index

 The dc_splunk_server field will show you how many indexers contain the data for a particular index. If you sort by count, check if the largest data counts are across all indexers.

You can also go a bit deeper to check the min/max/avg data count per indexer/index and see if the min or max are outside 3*stdev from average. Also checks if the data is not across all indexers.

| tstats count where index=* by index splunk_server
| stats avg(count) as avg_count min(count) as min_count max(count) as max_count stdev(count) as stdev_count dc(splunk_server) as dc_splunk_server by index
| eventstats max(dc_splunk_server) as total_splunk_servers
| where dc_splunk_server < total_splunk_servers OR (min_count < (avg_count - 3*stdev_count)) OR (max_count > (avg_count + 3*stdev_count))

 

dongwonn
Explorer

Hi @bowesmana Thank for Answer!

I checked balance as SPL that you gave to me. Balance looks like not bad. 

I confirmed that the major indexes are counted as many as the number of indexers.

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
Here is excellent presentation about event distribution
“ Best practises for Data Collection - Richard Morgan”.
You could found it at least from slide share service.
r. Ismo
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...