Hello Splunker,
I'm currently working on a new use case and need some helps
I'm working on a HF receiving Microsoft Cloud Logs (with https://docs.splunk.com/Documentation/AddOns/released/MSCloudServices) and I would like to forwards those logs to two differents TCP output (Splunk indexers), one with some fields anonymized, and the other without any index time transformation.
Here is a schema to help you understand my problem :
My thoughts :
I currently have a inputs.conf configured on my HF to receive the logs from MS Cloud (with sourcetype set to mscs:azure:eventhub, I think it's compulsory to keep this sourcetype)
Then I created props.conf & transforms.conf but should I put two TRANSFORMS-<class> in order to have two differents transforms depending on the destination ?
My props.conf :
[mscs:azure:eventhub]
TRANSFORMS-anonymize = user-anonymizer
My transforms.conf :
[user-anonymizer]
REGEX = ^(.*?)"\[{\\"UserName\\":[^,]*(.*)
FORMAT = $1"###"$2
DEST_KEY = _raw
Thanks a lot,
Gaétan
You could use CLONE_SOURCETYPE to do a "copy" of your event. It would have to work something like that.
1. Your input provides splunk with an event of a sourcetype - let's say - microsoft:cloud
2. You do a CLONE_SOURCETYPE to a temporary:sourcetype
3a. The microsoft:cloud event goes through all the normal ingest steps and you route it to output1 (or simply don't touch anything if it's your default output
3b. The temporary:sourcetype gets reinserted into the queue, passes all appropriate transforms and at the end is routed to output2 and you rewrite the sourcetype field back to microsoft:cloud.
You could use CLONE_SOURCETYPE to do a "copy" of your event. It would have to work something like that.
1. Your input provides splunk with an event of a sourcetype - let's say - microsoft:cloud
2. You do a CLONE_SOURCETYPE to a temporary:sourcetype
3a. The microsoft:cloud event goes through all the normal ingest steps and you route it to output1 (or simply don't touch anything if it's your default output
3b. The temporary:sourcetype gets reinserted into the queue, passes all appropriate transforms and at the end is routed to output2 and you rewrite the sourcetype field back to microsoft:cloud.
Hello PickleRick, thanks for the answer!
I followed your instructions and it does the job!
Thanks again
Why don't you just anonymize data on index time using SEDCMD?
Anonymize data - Splunk Documentation
Create a an anon app on the indexer that you want the data anonymized and put in the props.conf
in props.conf
[mscs:azure:eventhub] SEDCMD-user_anon = ^(.*?)"\[{\\"UserName\\":[^,]*(.*)
Hello Azeemering, thanks for your answer!
The thing is I need to be sure that the events that leave the HF are already anonymized for compliance reason. And I don't have access to the indexer pool n°2.
Regarding SEDCMD or regular expression is equivalent if I'm not mistaken.