Getting Data In

How Splunk indexers operation works when it comes into manual detention state ?

ashikuma
Explorer

How Splunk indexers operations works when it comes into manual detention state ?

We are migrating from RHEL 6 - RHEL 8 and here can't do OS upgrade on the same machine , so we will be getting new REHL 8 machines and then we need to install Splunk there and work accordingly.

Current setup : 5 indexers (RHEL 6) in indexer cluster , S.F -1 , R.F -1.

Our plan is to add new 5 indexers (RHEL 😎 in indexer cluster and route all sources to send data to newly added indexers , once that is complete will make sure any of the source is not sending anything to old indexers.

Enable manual detention on old indexers so that they dont replicate their data to other indexers (newly added), was going through manual detention documentation (below) and found only four operations are concerned  if peer is in manual detention.

  • stops replicating data from other peer nodes.
  • optionally stops accepting data from the ports that consume external data, causing the peer to no longer index most types of external data.
  • continues to index internal data and stream the data to target peer nodes.
  • continues to participate in searches.

https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/Peerdetention

But I would like to know if peer is in manual detention will it be rolling over the stored indexes data as per retention policy to frozen DB ? Or except above 4 points, all other operation will be as usual even if peer is in detention mode.
Why am I concerned because in my case we have hot DB (local mount - 700 GB data on each peer) and cold DB (NAS mount - 2.3 TB each peer) and we are going to share  same mount for new peers as well with creating separate subfolders from NAS (for new peer).

Second thing how my S.F and R.F behaves since its 1:1 , because most of the data will be with old peers and those would be in detention mode , are S.F & R.F gonna met , is there any impact in searches.

Final step would be run command splunk offline --enforce-count to decommission old peer one by one , but that all depends on above questions(if peer is in manual detention will it be rolling over the stored indexes data as per retention policy to frozen DB ?) , if peer (enabled detention mode) is not purging the data as per retention period then we would not have enough space in NAS storage (cold DB) to store the data which is coming after rolling over form hot DB from new peer. 
If old peers (enabled detention mode) continue to purge their data as per retention period in that way will have space cleared from one side and new data getting added to NAS (cold DB) from another side.

If all this works , i dont know if R.F is set to 1 means each peer will have one copy for data, if some peer doesnt have , i think they will create new replicate copies when we do decommission.

Please suggest 

 

 

 

 

0 Karma

soutamo
SplunkTrust
SplunkTrust

Hi

Your plans sounds to be ok. Just like I have told on the next post.

https://community.splunk.com/t5/Deployment-Architecture/Swap-indexers-from-indexer-cluster-with-new-...

You must ensure that all splunk versions will be same on the cluster. After you have migrate data to the new and decommissioned old ones you could update cluster and another nodes to more recent versions as guide said.

Then to your concerns.

I suppose that otherwise cluster peers will work as earlier (e.g. frozen buckets etc.), only real difference is that those don't accept any new data. Also they are working in searches as earlier. Actually when your SF&RF is 1 they don't replicate data on any situations (only when you are putting peer to offline with command splunk offline, at least I think so, as normally this changes replicas to primaries). If node crashed you haven't it's data to use in any searches until it is up again! For that reason you should update your SF & RF to offer HA for bucket level.

To be honest I don't know what happened when your RF&SF=1 and your node is in detention mode and you put it offline with --enforce-count as you haven't any replicas of those buckets which are in this node. Normally there are also secondaries which are changing to primaries in this phase and new replicas has created. Maybe it's best to remove detention before you  put those nodes to offline? My suggestions is that you open service request to Splunk support if you couldn't test this by yourself!

One more thing. You said "we are going to share  same mount for new peers as well with creating separate subfolders from NAS (for new peer)." I suppose that you have separate cold2frozen script which move those buckets to NAS based on hostname?

r. Ismo

0 Karma

ashikuma
Explorer

@soutamo  thanks for suggestions. I was expecting the same that roll over for stored data for different indexes should happen as usual from cold to frozen as per their retention period and clean up storage automatically so that other indexers can utilize that. If we can make sure that because here NAS mounted cold DB FS should be shared across all.(/usr/splunkcold) on each peer there will be subfolders from NAS side which is mounted over /usr/splunkcold. Did someone tried this earlier and noticed that during detention mode rollover option work as usual?
Because everyone tried to add new peer , do re balancing and remove old peer , but here for us NAS is the challenge , and that much storage is not available.

We tried same scenario with another env where we have  R.F -1, S.F -1 , it worked there , the peer started creating replicated copies of data and transfer to working indexer to cluster once we ran splunk offline --enforce-count command.  But there was not NAS mounted FS in that environment. After decommissioning old peers we checked overall event count vs event count before decommissioning , it was almost same , only 1000k difference from million of events. 

Only reason to put all old 5 peers in detention because have lot of data ( 13 TB across ) and we dont want to accept anymore , new traffic should be to new indexers and once old once started rolling over their data from cold to frozen , it will start cleaning up storage , on that time we can start our decommissioning process, so you can consider we can wait for 2 weeks , for 2 weeks our 5 peers (holding 13 TB data) would be in manual detention and new 5 peers will  be primary to accept data , on that time will all data be searchable , will search factor, R.F met ? I know peer in detention mode will not replicate anything untill we start decommission process , what how new peers will met search and replication factor, i am wondering there.


One more thing. You said "
we are going to share  same mount for new peers as well with creating separate subfolders from NAS (for new peer)." I suppose that you have separate cold2frozen script which move those buckets to NAS based on hostname?

Ans:  There is no scripts in place , NAS mount I mean to say , its local FS (/usr/splunkcold) but mounted from NAS not from local storage , and it is working on the basis of retention period like hot DB works . (frozentimeperiodinseconds -parameter)

 

 

Any thoughts\suggestions 

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!