Hi
I'm wondering if there is a way to control on which indexers an index resides
E.g. there are 5 Indexers (+all the other infrastructure like search head, cluster master etc)
3 of them should hold normal system logs
2 of them should hold confidential data with different protection needs
What options are there to control that just_a_normal_index resides on the normal 3 indexers, but the super_secret_danger index will only reside on the 2 other indexers, besides creating 2 own clusters.
There is no way to keep the indexers in a cluster from all having the same configuration - and that includes indexes.conf. So the super_secret_danger index will exist on all of them. Before you go any further, you need to decide:
Do I need to have the secure data physically restricted to these two indexers? Is this a regulatory or policy requirement? If yes, then read the rest of this.
Do I only need to restrict access to this data to certain users? If yes, then you can let the data exist on all servers - but simply restrict the visibility of the index to the proper users. For users who do not have permissions on the index, it is as if the index does not exist. If you can, use this solution and skip the rest of this answer!
The only way that I see to do what you asked: direct the data that belongs in the super_secret_danger index so that it is only forwarded to the 2 special indexes. This solution (below) has serious limitations, so you certainly may be better off using 2 separate indexer clusters to divide this data. But here is the possible solution:
You can do this by editing outputs.conf on the forwarders to define multiple groups:
[tcpout]
defaultGroup = basicgroup
[tcpout:basicgroup]
server=10.1.1.197:9997,10.1.2.197:9997,10.1.3.197:9997
[tcpout:secretgroup]
server=10.1.4.197:9997,10.1.5.197:9997
Then you can use the group names to route the input data appropriate, using inputs.conf
[monitor:///var/log/stuff]
#going to regular basicgroup, so no routing needed
[monitor:///var/log/secret.log]
_TCP_ROUTING = secretgroup
Note that all the other inputs.conf settings would be the same as usual. The big pain will be that you will have to have separate stanzas to route the inputs; stanzas cannot overlap in inputs.conf
So now super_secret_danger will exist on all the indexers, but it will only have data on the indexers that belong to the secretgroup that was defined on the forwarder.
IMPORTANT: You will also have to disable index replication for the super_secret_danger index. This may completely kill this idea for you.
indexes always reside on every indexer in an indexer cluster. what you really mean is to control the data in indexes on each indexer. It would be better to rephrase that question since it is misleading the potential reader without understanding the topic correctly. And btw... to not replicate your data to protect your data is not a good idea at all. the best option is to build 2 clusters to seperate your data physically or use the index permissions as lguinn2 stated already.
There is no way to keep the indexers in a cluster from all having the same configuration - and that includes indexes.conf. So the super_secret_danger index will exist on all of them. Before you go any further, you need to decide:
Do I need to have the secure data physically restricted to these two indexers? Is this a regulatory or policy requirement? If yes, then read the rest of this.
Do I only need to restrict access to this data to certain users? If yes, then you can let the data exist on all servers - but simply restrict the visibility of the index to the proper users. For users who do not have permissions on the index, it is as if the index does not exist. If you can, use this solution and skip the rest of this answer!
The only way that I see to do what you asked: direct the data that belongs in the super_secret_danger index so that it is only forwarded to the 2 special indexes. This solution (below) has serious limitations, so you certainly may be better off using 2 separate indexer clusters to divide this data. But here is the possible solution:
You can do this by editing outputs.conf on the forwarders to define multiple groups:
[tcpout]
defaultGroup = basicgroup
[tcpout:basicgroup]
server=10.1.1.197:9997,10.1.2.197:9997,10.1.3.197:9997
[tcpout:secretgroup]
server=10.1.4.197:9997,10.1.5.197:9997
Then you can use the group names to route the input data appropriate, using inputs.conf
[monitor:///var/log/stuff]
#going to regular basicgroup, so no routing needed
[monitor:///var/log/secret.log]
_TCP_ROUTING = secretgroup
Note that all the other inputs.conf settings would be the same as usual. The big pain will be that you will have to have separate stanzas to route the inputs; stanzas cannot overlap in inputs.conf
So now super_secret_danger will exist on all the indexers, but it will only have data on the indexers that belong to the secretgroup that was defined on the forwarder.
IMPORTANT: You will also have to disable index replication for the super_secret_danger index. This may completely kill this idea for you.
Thanks
So if I got that right there are these options
Option 1 : Build 2 Clusters, comes at the cost of roughly 1 additional ClusterMaster/Search Head
Option 2 : disable replication for special index, route data manually, other indexes still get replicated to secret indexes
Is there maybe technically an option to abuse multi site clusters? But i can already get the additional complexity might not be worth the trouble.
You got that right. But you don't need an extra search head - one search head can talk to multiple clusters, no problem.
But if you are using your cluster master AS the search head - then you need two. (I wasn't sure from your comment above.) You really should not combine the search head and the cluster master in a single Splunk instance. Especially if you are going to do a lot of searching.
And you are also right about multi-site clustering! A multi-site cluster still has only 1 master, but it does need a search head for each site. You could set the multi-site replication factor so that the data is only replicated across the site of origin, I think. Then each bucket would replicate either across its site of origin (site1 or site2), but across not both. So you would still have to do the routing on the forwarders to make sure that the data originated in the proper site. But now you can turn on replication for the special index, if you set the factors something like this:
site_replication_factor = origin:2, total:2
site_search_factor = origin:2, total:2
I hadn't thought of this option. I like the multi-site option better than option 2. But it's still complicated...