Splunk Search

How to anonymize data using REGEX in transforms.conf for an undefined number of characters?

SirHill17
Communicator

Hi,

I would like to anonymize data (data is file system path) using REGEX. I succesfully managed to hide data like IP, Credit Card Number, etc. But not able to replicate the setup for an undefined number of characters.

Could you please help reviewing the below code:

props.conf:

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

transforms.conf

[filepath-anonymizer]
REGEX = (?m)^(.*)filePath=\S+(.*)$
FORMAT = $1filePath=XXXX$2
DEST_KEY = _raw

Below an example of logs that must be transformed:

2016-02-25 14:40 GMT+1 this is only an example filePath="/tmp/file.log" error script 1

The log is indexed without any modification.

Thanks for your help.

Cyril

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

Hi, please try this regex with positive lookahead and positive lookbehind.

Props.conf

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

Transforms.conf

[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw

View solution in original post

jkat54
SplunkTrust
SplunkTrust

Hi, please try this regex with positive lookahead and positive lookbehind.

Props.conf

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

Transforms.conf

[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw

jkat54
SplunkTrust
SplunkTrust

Ok so what is the architecture here? Are there forwarders etc? You say you can mask CCredit cards but did you do that in development on single Splunk instance and now you're trying this other redaction in production where the architecture is different?

0 Karma

SirHill17
Communicator

I am working on a DEV environment (same one as Credit Card masking). Files props.conf and transforms.conf have been updated on the indexer server. Data is coming from a forwarder yes.

0 Karma

jkat54
SplunkTrust
SplunkTrust

also what if you put single quotes around the regex?

0 Karma

SirHill17
Communicator

Great, it's working with the single quotes. Thanks!!!

0 Karma

jkat54
SplunkTrust
SplunkTrust

Awesome! I edited the answer to add the single quotes for folks looking in the future.

Thanks for the follow up and marking the answer!

0 Karma

SirHill17
Communicator

In case it could help:

I have customized the REGEX to take in account the case where the path would contain a space char (which can happened but should not 🙂 )

'^(.*)(?<=filePath=").*?(?=")(.*)$'

jkat54
SplunkTrust
SplunkTrust

Very nice, great follow up! I didnt even think about spacing in file paths...

0 Karma

jkat54
SplunkTrust
SplunkTrust

this makes me think your first regex might have worked with single quotes too. Its hard to tell which regex is less resource intensive without testing but I assume my regex requires more effort by the CPU due to the lookaheads.

0 Karma

SirHill17
Communicator

No more success. From your input I also tried

(?<=filePath=")\S+(?=")

but no more success.

Can anything else impact it?

0 Karma

jkat54
SplunkTrust
SplunkTrust

My apologies. I have corrected my answer.

0 Karma

SirHill17
Communicator

Unfortunately no change. I don't really know what's wrong...

0 Karma

Richfez
SplunkTrust
SplunkTrust

What happens when you do this? Anything, or is the _raw unchanged?

And have you tried without multiline? (The (?m) at the front)? That may also be making it behave slightly differently.

0 Karma

SirHill17
Communicator

Yes _raw is unchanged. Just tried without (?m) but no success.

Is the FORMAT mentioned correct? My concern is about the number of char that XXXX replace. If the filePath has 15 characters, it will be replace by XXXX (4X) ? Is that right?

Thanks.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The FORMAT string looks correct to me. Yes, the filepath will be replaced by 4 X's no matter how many characters are in the original path.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Is the sourcetype on the input set correctly (amit_anonymize_data)?

---
If this reply helps you, Karma would be appreciated.
0 Karma

SirHill17
Communicator

Yes the sourcetype is correct.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...