Deployment Architecture

In an indexer cluster is there a way to query one indexer for data that it indexed as well as data that has been replicated to it?

myandow
Path Finder

We have a distributed environment with multi-site indexer clustering and search head clustering. The replication factor is setup so that every indexer will have a copy of all data that has been indexed by the cluster. I'm trying to figure out if there is a way to craft a query from our search heads that will be directed to a specified indexer in the cluster and get it to return all data for a particular index.

If I specify splunk_server in the query, it seems to direct the request to the indexer specified, but it only returns events that were indexed by that particular indexer, not all events that have been indexed by the cluster.

Is there another way to go about this?

0 Karma
1 Solution

sowings
Splunk Employee
Splunk Employee
  1. Specifying a retention policy to ensure that every host has a copy is inherently fragile; the cluster will be unable to "heal" successfully if even one of the hosts is down (e.g. maintenance, patching).
  2. Clustered search only utilizes one searchable copy of the data at a time; attempting to use multiple searchable copies would result in duplicate events / results.
  3. SF > 1 is about "recovery from failure" not "search faster", per point 2 above.
  4. The cluster master maintains a list of "primary" buckets, these are the active and "in use" search copies.
  5. Search heads participating in clustered search only search "primary" buckets.
  6. The cluster master sees all. If you want to know where buckets are, use the /services/cluster/master/buckets REST endpoint documented here.

View solution in original post

sloshburch
Splunk Employee
Splunk Employee

Comment from @d: In a cluster indexers will only return data from buckets that are designated as primaries. What is primary on indexerA is not in indexerB and what is being indexed by indexerA is likely to remain primary until a failure occurs. The only apples-apples comparison is to index the same data in each indexer and test with it with splunk_server. Anything else will be an approximation.

0 Karma

sowings
Splunk Employee
Splunk Employee
  1. Specifying a retention policy to ensure that every host has a copy is inherently fragile; the cluster will be unable to "heal" successfully if even one of the hosts is down (e.g. maintenance, patching).
  2. Clustered search only utilizes one searchable copy of the data at a time; attempting to use multiple searchable copies would result in duplicate events / results.
  3. SF > 1 is about "recovery from failure" not "search faster", per point 2 above.
  4. The cluster master maintains a list of "primary" buckets, these are the active and "in use" search copies.
  5. Search heads participating in clustered search only search "primary" buckets.
  6. The cluster master sees all. If you want to know where buckets are, use the /services/cluster/master/buckets REST endpoint documented here.

Richfez
SplunkTrust
SplunkTrust

Could you tell us why you'd like to do this? It seems like it's fundamentally and precisely not what the clustering is supposed to achieve, so it makes me wonder if perhaps a change in how its set up may help your situation.

My only real thought is that you have a multi-site physical topology but haven't set up multi-site clustering with site-aware search heads.

myandow
Path Finder

The reason I am looking to query the indexers in this way is for troubleshooting and performance testing. We have some indexers setup on different hardware/storage and with different OS configurations and want to see if we can get an apples to apples comparison on performance for searches and indexing. If we can pull the same events from each indexer then we can be more confident in the uniformity of the data set that we are querying.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Hmm. You might consider posting a different question: How to compare indexer performance when clustering is used?

The question here is very specific and there might be alternate approaches to solve the overall challenge.

That said, there might be approaches that can be used with the hidden bucket field. What is the search factor and replication factor?

0 Karma

myandow
Path Finder

I will follow up with a more open ended question regarding performance comparisons. I am still interested in finding the answer to this particular question as well.

Replication:
origin:3,total:6

Search:
origin:2,total:4

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...