I have read the documentation on routing and filtering events (https://docs.splunk.com/Documentation/Splunk/8.1.0/Forwarding/Routeandfilterdatad), but can you tell me if I'm going in the right direction ? Here is my scenario:
I have 1 Heavy Forwarder that receives events from multiple Universal Forwarders, the HF then forwards the events to an Indexer. UFs -> HF -> IDX
There is a new data source that we want to collect from one of the Universal Forwarders, but we don't want this data to be sent to the Indexer yet. We would like to index it on the Heavy Forwarder first because we need to test out some data anonymization on the events before sending them to the Indexer. All other data collected by the Universal Forwarders should still be sent to the Indexer as usual.
Our HF is configured with a tcpout defaultGroup pointing to the Indexer.
What I understand is that I have to configure this on my Univeral Forwarder for the input i want to index locally:
If I understand correctly: these events will be indexed locally on the HF, but they will also be forwarded to the Indexer because of the defaultGroup.
Would it be a good solution to use props/transforms on the HF to change the _TCP_ROUTING of these events (to give them an incorrect output group like "NoForward" for instance) ? Or is there a another/simpler solution ?
Thank you !
First, the UFs->HF->IDX architecture is sub-optimal. The HF functions as a bottleneck and single point of failure. In multiple-indexer environment it can cause uneven distribution of data across indexers. It's better for the UFs to send directly to the indexer.
Second, if a HF indexes data then it is no longer a heavy forwarder.
Finally, yes, there is another, simpler solution. Send the new data to a dedicated test index (I call mine "test"). Once you have the anonymization worked out you can change the UF to send to the production index and then delete the test index.
A better, but less simple solution is for the UF to send the new data to a development Splunk instance (which can be a standalone instance). You can restrict access to this instance to prevent non-masked data from being seen by others. Once you have the anonymization worked out you can change the UF to send to the production indexer and then delete the test data from the dev instance.
Thank you for your answer ! I agree that UFs->IDX is usually the optimal solution, I should have mentioned that we have constraints that make the HF necessary. We don't want to manage N network flows between the forwarders and the Indexer, and the data should not reach the Indexer without being fully anonymized first.
About the indexing on the Heavy Forwarder, this is not really long term, we would just like to enable indexing for specific data sources temporarily to test things locally, and then we would disable indexing and open forwarding again for these data sources like normal. The goal is not to make this HF an Indexer.
It is true that our current setup isn't optimal at all, but we would not be able to get additonal Splunk machines fast enough to answer the need... So that's why I was wondering if a "quick and dirty" solution like the one in my first message could work right now.