I am trying to verify that certain indexes are replicating across my index cluster. My splunk installation is a distributed deployment with 8 peers composing the index cluster with their master node, 3 search-head cluster members with their deployer, 1 stand-alone s-h running Splunk App for Stream, and 4 forwarders with their deployer server. I have SOS deployed on the shc members and the indexers. I have Splunk App for Stream deployed with App for Stream running on the stand-alone s-h and TA_Stream running on the 4 forwarders with the setuid.sh executed to handle the permissions for the streamfwd binary (I run splunk under the splunk account). All of the stream configurations that I set using the App for Stream UI on the stand-alone search-head are pointing to the MAIN index. The [main] stanza on all my indexers are configured with repFactor = auto (I am going to provide the [main] stanza in a comment after this first posting).
After a moderate amount of troubleshooting this implementation which some of you have likely been following in earlier questions, I finally have a relatively stable platform though I think there are still some issues, the basis of this question about verifying index replication for certain indexes being one of those items I still don't fully understand or see as achieving completely normal operations.
When I run SOS on my shc members and look at the Indexing > Index Replication > Cluster Master View the search query returns indicating that my replication factor = 3, the search factor = 1, Cluster initialization state: Index replication not enabled, and Cluster indexing state: Index replication not enabled.
So how can I confirm that I am actually getting complete replication across my index cluster?
You should be using the CM to validate RF and SF. The peers themselves dont know, per say, about the rest of the cluster. They check with the CM and it manages replication and search tasking.
Use firebrigade to check actual bucket replication : https://splunkbase.splunk.com/app/1632/. That link is the app, there is also a TA component that is required on the indexers.
If you're building this out to scale in a production environment, you really should reach out to Splunk for Education or Professional Services. Alot of what you are trying to figure out is covered in official training and PS engagements.
Hi all !
You can check number of elements by indexer
|tstats count where index=index_name by splunk_server _time span=1d |timechart sum(count) as count span=1d by splunk_server
Just indicates the name of your index et change the span as necessary (day, month, ..)
You should be using the CM to validate RF and SF. The peers themselves dont know, per say, about the rest of the cluster. They check with the CM and it manages replication and search tasking.
Use firebrigade to check actual bucket replication : https://splunkbase.splunk.com/app/1632/. That link is the app, there is also a TA component that is required on the indexers.
If you're building this out to scale in a production environment, you really should reach out to Splunk for Education or Professional Services. Alot of what you are trying to figure out is covered in official training and PS engagements.
Problem resolved - problem being why I didn't get replication for the main index. I had set-up Splunk App for Stream to use the main index, but had the streams configured only for stats versus enabled for events. Essentially, no data was populating into the main index. So unlike our beloved Lucas "Luke" Jackson in Cool Hand Luke for whom "sometimes nothing is a real cool hand", for Splunk index replication, nothing could be further from the truth - lol!
Once you have the components installed on both the indexer and the search heads, you need to wait. It takes time for it to populate.
The monitored indexes is built automatically from a saved search. You look in the read me and see more specs on it.
Dashboards will populate based on index and indexer selection. Again, based on that saved search and the dbinspect command.
Have started to try using FB as you suggested. Installed it on a stand-alone sh (FB v 2.0.3 and TA v 2.0.1). Not clear from the documentation or from within the FB UI or from the sh where it is installed (probing about in different directories and hunting for it) where and how to configure monitored_indexes.csv. If you have some guidance on that that would be awesome.
/opt/splunk/etc/slave-apps/_cluster/local/indexes.conf [main]
/opt/splunk/etc/slave-apps/_cluster/local/indexes.conf maxDataSize = auto_high_volume
/opt/splunk/etc/slave-apps/_cluster/local/indexes.conf maxHotBuckets = 10
/opt/splunk/etc/slave-apps/_cluster/local/indexes.conf repFactor = auto
All the other attributes are using the /opt/splunk/etc/system/default/indexes.conf default values. I have also set repFactor = auto in the stanzas of _introspection, _audit, sos, _blocksignature, _internal, _thefishbucket, history, sos_summary_daily, splunklogger, and summary.
repFactor = auto
This configures the index to be replicated. With a replication factor of 3 and search factor of 1, you are ensuring that you have three replicated copies, while search factor of 1 means you will only have one searchable copy. Typically this isnt useful.
Most commonly you will have SF of RF-1, so 3 : 2.
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thesearchfactor
I'd advise you install fire brigade.
Thanks. What is not adding up for me is that I have the replication_factor set to 5 and the search_factor set to 3. This is how it is set in the /opt/splunk/etc/master-apps/_cluster/local/server.conf in the [clustering] stanza on the cluster master as well as all of the 8 cluster peers. Also, if I execute show cluster-status --verbose on the cluster master, I get the following result:
_audit
Number of non-site aware buckets=0
Number of buckets=867
Size=23722392
Searchable YES
Replicated copies tracker
867/867 867/867 867/867 867/867 867/867
Searchable copies tracker
867/867 867/867 867/867
_internal
Number of non-site aware buckets=0
Number of buckets=1050
Size=1064524035
Searchable YES
Replicated copies tracker
1050/1050 1050/1050 1050/1050 1050/1050 1050/1050
Searchable copies tracker
1050/1050 1050/1050 1050/1050
sos
Number of non-site aware buckets=0
Number of buckets=753
Size=43275408
Searchable YES
Replicated copies tracker
753/753 753/753 753/753 753/753 753/753
Searchable copies tracker
753/753 753/753 753/753
Yet, as previously noted, from SOS>Indexing>Index Replication>Cluster Master View, this app search returns that rep factor = 3 and search factor = 1. Just don't get what SOS is looking at to get these values.
I am also confused over repFactor and how it controls replication. I have set repFactor = auto in ALL of the indexes I previously identified, yet only _audit, _internal, and sos are getting replicated.
Another aspect of replication that is confusing is what is stated in the documentation for the Admin Manual > indexes.conf spec about the repFactor attribute:
repFactor = |auto
* Only relevant if this instance is a clustering slave (but see note about "auto" below).
* See server.conf spec for details on clustering configuration.
* Value of 0 turns off replication for this index.
* If set to "auto", slave will use whatever value the master is configured with
* Highest legal value is 4294967295
The setting of repFactor as declared here states that the value is picked up from the master. In my current situation, looking at the indexes on the master using btool shows that the repFactor = 0 for ALL indexes, including _audit, _internal, and sos. Yet as mentioned when running a show cluster-status, I see that there is replication for those 3 indexes. Like Mick Jagger's famous declaration about Satan, what's confusing me is just the nature of this game - oh yeah! - lol
I think there is a bug that causes the replication factor to be misstated unless you are looking at it from the cluster master. I have seen this before - when you go to another source, it always says 3 and 1 when it is clearly set on the cluster master to something else.
Actually, the cluster master is the only server that needs to know the RF/SF - so the other servers are just reporting bogus info. Really, the reporting of RF/SF should just be disabled on other servers (unless someone writes the code so that it calls back to the cluster master to get the right info).
Also, re: repfactor=auto, you said "I have also set repFactor = auto in the stanzas of _introspection, _audit, sos, _blocksignature, _internal, _thefishbucket, history, sos_summary_daily, splunklogger, and summary." Most of these are not true indexes - they are internal data structures for Splunk, and probably can't be replicated. Some of these indexes are also empty (unless you have added data to them), so there is nothing to replicate (the summary index is a good example of this).
You may see the sos_summary_daily start to replicate once/if SOS actually starts to summarize...
Thanks. I'm actually really just interested in making sure that main is replicating. Any suggestions on that?
Once the index is replicating, it should show up in the UI on the cluster master, under Settings -> Indexer Clustering.
That's what I would also expect, but main does not show up in the Indexes tab of Index Clustering when viewed on the cluster master.
As per the recommendation of esix_splunk, I have (or rather should say, am trying - lol) to set up Fire Brigade to look at the state of the indexes. I haven't gotten Fire Brigade completely (aka correctly) set up - yet - (I don't know where or how to configure the monitored_indexes.csv file that FB seems to require), but in looking at what is picked up in FB from REST, specifically looking at the Troubleshooting view>Index Configuration Summary, there are 10 indexes listed including main and sos. As previously mentioned, sos is correctly replicating but main is not. In this troubleshooting panel main is listed as having 2 repFactors: auto and 0. sos is indicated as having only auto.
In carefully reading the indexes.conf spec, it doesn't seem like there are all that many attributes to configure replication so really not sure how to get this resolved at this point. It looks to me like its basically a "set repFactor = auto" in an index stanza and its good to go. I have done this for main but I'm not yet "good to go", alas and alack.