Getting Data In

Is there a best practice for maintaining replication for a specific index between two independent Splunk installations?

otan1010
Explorer

Hi,

Is there a best practice way of keeping a set of indexes replicated between two independent Splunk installations?

What I'm looking for is basically selective multi-site replication, e.g. choose an index that's replicated between the sites so that they both have access to identical copies of the index at all times.

Hope this makes sense.

Cheers,
Andreas

0 Karma
1 Solution

woodcock
Esteemed Legend

Have a search head that talks to both clusters of indexers and have it run a Summary Index query on the first cluster's data but write out the results of this to an index that only exists on the second cluster. This way, you do not take the license hit twice.

View solution in original post

nnmiller
Contributor

Does this index data come from a unique set of hosts or inputs? If so, you could configure those hosts/inputs to round-robin to both indexers, then use a distributed search configuration on the search heads needing to search this index. However, these SHs would end up sending all searches to both of these indexers, so there's the ugly overhead. (I can hear the SH cluster/IDX cluster fans booing in the background already...peering has fallen into disfavor, even though it can be useful in tiered environments. :-))

The only other way to manage a such a duplication without taking double license hit that I can think of: configure the index on both IDXes, then do some sort of replication kludge outside of Splunk. You'd necessarily have to shut down Splunk on the receiving IDX during that replication. Maybe stand up a "special flower" IDX just for this index?

0 Karma

otan1010
Explorer

Does this index data come from a unique set of hosts or inputs?
The index in question is a summary index.

You'd necessarily have to shut down Splunk on the receiving IDX during that replication
I have a need of continual replication, e.g. the data shouldn't really differ more than a minute between the two.

Maybe stand up a "special flower" IDX just for this index?
I don't know what you mean by this 😞

0 Karma

nnmiller
Contributor

"Special flower" - a machine set up specially to do one thing because otherwise you'd have to do a lot of crazy things on your normal install to accomplish the same thing. But since you need near-real time availability, a special IDX wouldn't work anyway.

Given your other comments, I can't see any work around short of standing up a cluster or eating the license consumption of sending the data two places.

0 Karma

Richfez
SplunkTrust
SplunkTrust

otan1010's comments indicate that:

  • They need to have one relatively small index replicated between sites.
  • They need to not have any of the other indexes replicated between sites.

Given that use case and those requirements, I think this is the solution (if there is one). I don't disagree that this could replicate it all if such was the desire, but it doesn't need to.

0 Karma

Richfez
SplunkTrust
SplunkTrust

nnmiller,

You know more about this than I do, but why wouldn't it work to stand up a small indexer on each side of the great divide, and have those configured with only the one index. Have those two indexers clustered and doing all the wonderful indexer clustering things to keep that one index replicated. Or was that what you were referring to when you said "standing up a cluster"?

Then add those indexers to the indexers the SH's on each side will search.

Added advantage: if you ever needed a second index done this way, just toss it onto the "special flower" indexer cluster then sit back and enjoy the results!

0 Karma

nnmiller
Contributor

That would certainly work, @rich7177, but if you are going to that extent, why not just turn the whole arch into a cluster?

Seems like overkill for a single index unless you are intending to re-archtect the environment entirely and start with this index as the first in a migration.

0 Karma

otan1010
Explorer

I see, thank you.

No, me neither. It's not an incredible amount of data though, so that might be a viable solution anyway.

0 Karma

woodcock
Esteemed Legend

Have a search head that talks to both clusters of indexers and have it run a Summary Index query on the first cluster's data but write out the results of this to an index that only exists on the second cluster. This way, you do not take the license hit twice.

View solution in original post

otan1010
Explorer

That wouldn't leave the two sites with two identical sets of data so that's not a viable solution in this case, sadly.

0 Karma

Richfez
SplunkTrust
SplunkTrust

I think I like this answer better than building an entire cluster for one replicated index.

Example number 1 here seems to give an example of doing something much like this with no summary, just writing data.

0 Karma

woodcock
Esteemed Legend

Just because most people roll up the data and that is the most common use case, does not mean that you have to! You can just dump raw events with collect directly into a Summary Index and you will have 2 copies of identical raw data except that one might have to have a field called "raw" (or something) instead of "_raw".

0 Karma

otan1010
Explorer

I'll mark this as the answer as it's definitely a viable solution, even though I won't be using it myself due to other considerations.

I'll likely make a solution using the Splunk SDK.

Thanks for your time 🙂

For others looking at this rich7177 links to a description of the collect command which you can use to achieve what woodcock describes:

eventtypetag="download" | collect index=downloadcount

This will basically duplicate the events into the defined index, with some changes to source and sourcetype (and raw perhaps, as woodcock writes).

0 Karma

Richfez
SplunkTrust
SplunkTrust

I believe you are after indexer clustering. Please take a look at the link and see if that doesn't help you.

otan1010
Explorer

It's not exactly what I want (I think); The purpose of indexer clustering is, as far as I'm aware, to have two or more indexers with the exact same data. Multi site indexer clustering would be the same but you have multiple sites with the exact same indexers/data.

I can't see that there's any way to use this mechanism for anything but data redundancy/performance benefits, e.g. you always replicate the whole environment or nothing at all.

What I'm after is to be able to choose an index (Call it "index1") in one environment ("site1") and replicate that index between the two sites ("site1" and "site2").

So "site1" would have a totally different data set than "site2" EXCEPT for "index1", which they would share and both have identical copies of.

0 Karma

Richfez
SplunkTrust
SplunkTrust

I did notice the "independent" clause, but wasn't sure how independent.

Let me move this to a comment and see if that makes this question "unanswered" so it bumps back up to the top and red, see if anyone else knows then.

otan1010
Explorer

Yeah, thanks for taking time to answering my question anyhow 🙂

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!