Solved: Configure different index time transformations for...

GaetanVP · ‎06-27-2022

Hello Splunker,

I'm currently working on a new use case and need some helps

I'm working on a HF receiving Microsoft Cloud Logs (with https://docs.splunk.com/Documentation/AddOns/released/MSCloudServices) and I would like to forwards those logs to two differents TCP output (Splunk indexers), one with some fields anonymized, and the other without any index time transformation.

Here is a schema to help you understand my problem :

My thoughts :
I currently have a inputs.conf configured on my HF to receive the logs from MS Cloud (with sourcetype set to mscs:azure:eventhub, I think it's compulsory to keep this sourcetype)
Then I created props.conf & transforms.conf but should I put two TRANSFORMS-<class> in order to have two differents transforms depending on the destination ?

My props.conf :
[mscs:azure:eventhub]
TRANSFORMS-anonymize = user-anonymizer

My transforms.conf :
[user-anonymizer]
REGEX = ^(.*?)"\[{\\"UserName\\":[^,]*(.*)
FORMAT = $1"###"$2
DEST_KEY = _raw

Thanks a lot,
Gaétan

PickleRick · ‎06-27-2022

You could use CLONE_SOURCETYPE to do a "copy" of your event. It would have to work something like that.

1. Your input provides splunk with an event of a sourcetype - let's say - microsoft:cloud

2. You do a CLONE_SOURCETYPE to a temporary:sourcetype

3a. The microsoft:cloud event goes through all the normal ingest steps and you route it to output1 (or simply don't touch anything if it's your default output

3b. The temporary:sourcetype gets reinserted into the queue, passes all appropriate transforms and at the end is routed to output2 and you rewrite the sourcetype field back to microsoft:cloud.

View solution in original post

PickleRick · ‎06-27-2022

You could use CLONE_SOURCETYPE to do a "copy" of your event. It would have to work something like that.

1. Your input provides splunk with an event of a sourcetype - let's say - microsoft:cloud

2. You do a CLONE_SOURCETYPE to a temporary:sourcetype

3a. The microsoft:cloud event goes through all the normal ingest steps and you route it to output1 (or simply don't touch anything if it's your default output

3b. The temporary:sourcetype gets reinserted into the queue, passes all appropriate transforms and at the end is routed to output2 and you rewrite the sourcetype field back to microsoft:cloud.

GaetanVP · ‎06-28-2022

Hello PickleRick, thanks for the answer!

I followed your instructions and it does the job!
Thanks again

Azeemering · ‎06-27-2022

Why don't you just anonymize data on index time using SEDCMD?

Anonymize data - Splunk Documentation

Create a an anon app on the indexer that you want the data anonymized and put in the props.conf

in props.conf

[mscs:azure:eventhub]
SEDCMD-user_anon =  ^(.*?)"\[{\\"UserName\\":[^,]*(.*)

GaetanVP · ‎06-27-2022

Hello Azeemering, thanks for your answer!

The thing is I need to be sure that the events that leave the HF are already anonymized for compliance reason. And I don't have access to the indexer pool n°2.

Regarding SEDCMD or regular expression is equivalent if I'm not mistaken.

Configure different index time transformations for different outputs | Heavy Forwarder

heavy forwarder

indexer

inputs.conf

props.conf

sourcetype

transforms.conf

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?