Hi,
I would like to anonymize data (data is file system path) using REGEX. I succesfully managed to hide data like IP, Credit Card Number, etc. But not able to replicate the setup for an undefined number of characters.
Could you please help reviewing the below code:
props.conf:
[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer
transforms.conf
[filepath-anonymizer]
REGEX = (?m)^(.*)filePath=\S+(.*)$
FORMAT = $1filePath=XXXX$2
DEST_KEY = _raw
Below an example of logs that must be transformed:
2016-02-25 14:40 GMT+1 this is only an example filePath="/tmp/file.log" error script 1
The log is indexed without any modification.
Thanks for your help.
Cyril
Hi, please try this regex with positive lookahead and positive lookbehind.
Props.conf
[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer
Transforms.conf
[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw
Hi, please try this regex with positive lookahead and positive lookbehind.
Props.conf
[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer
Transforms.conf
[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw
Ok so what is the architecture here? Are there forwarders etc? You say you can mask CCredit cards but did you do that in development on single Splunk instance and now you're trying this other redaction in production where the architecture is different?
I am working on a DEV environment (same one as Credit Card masking). Files props.conf and transforms.conf have been updated on the indexer server. Data is coming from a forwarder yes.
also what if you put single quotes around the regex?
Great, it's working with the single quotes. Thanks!!!
Awesome! I edited the answer to add the single quotes for folks looking in the future.
Thanks for the follow up and marking the answer!
In case it could help:
I have customized the REGEX to take in account the case where the path would contain a space char (which can happened but should not 🙂 )
'^(.*)(?<=filePath=").*?(?=")(.*)$'
Very nice, great follow up! I didnt even think about spacing in file paths...
this makes me think your first regex might have worked with single quotes too. Its hard to tell which regex is less resource intensive without testing but I assume my regex requires more effort by the CPU due to the lookaheads.
No more success. From your input I also tried
(?<=filePath=")\S+(?=")
but no more success.
Can anything else impact it?
My apologies. I have corrected my answer.
Unfortunately no change. I don't really know what's wrong...
What happens when you do this? Anything, or is the _raw unchanged?
And have you tried without multiline? (The (?m)
at the front)? That may also be making it behave slightly differently.
Yes _raw is unchanged. Just tried without (?m) but no success.
Is the FORMAT mentioned correct? My concern is about the number of char that XXXX replace. If the filePath has 15 characters, it will be replace by XXXX (4X) ? Is that right?
Thanks.
The FORMAT string looks correct to me. Yes, the filepath will be replaced by 4 X's no matter how many characters are in the original path.
Is the sourcetype on the input set correctly (amit_anonymize_data)?
Yes the sourcetype is correct.