Hi,
I need help writing a regex which must anonymize email address which doesn't below to the company domain. I already did some tests but with no success. Please find below the regex I tried:
^(.*)(?:(?<!\S)(\w[\w\-\.]+@domain.com))(.*)$
So the aim of the regex is to have all the email addresses external of the company anonymized and keep in clear text the internal email addresses.
Can someone help with that?
Thanks!!!
This pages walks you through everything except building the RegEx:
http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles
Try this for a RegEx:
[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)
This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com
.
This pages walks you through everything except building the RegEx:
http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles
Try this for a RegEx:
[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)
This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com
.
Works fine!! Awesome!
Would you know how to do to use it for multiple domain names?
I tried with two SECCMD but doesn't work as the second one overwrites the first one.
The answer I gave would work fine with multiple domains but I like woodcocks regex. So try adding an or "|"
Thanks guys!
Like this:
[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@(?:YourCompanyDomainHere.com|OtherCompany.org|AndOnAndOn.etc))(?:[^A-z]|$)
Since you'll use props, i recommend this multi step approach:
1st change the format of the company domain, 2nd redact every email address that matches standard email format, then change company domain back to correct format:
[sourcetype]
SEDCMD-aaa=s/@company\.org/ at company.org/g
SEDCMD-bbb=s/(\w+)@(\w+)/****@\2/g
SEDCMD-ccc=s/ at /@/g
You'll note my regex in SEDCMD-bbb is not perfect for email matching, but puts you on the right track.
I also tested your solution, works fine as well. Thanks!
I gave regex hell and couldn't make it work for nothing. Woodcocks regex works in my limited testing but I was also able to find an "all inclusive" email address detecting regex that was over 4000 characters long. So i don't believe there is a perfect one-liner regex for this problem & would recommend the multi step approach instead.
Yes, RegEx for email is complicated.
I'm still amazed with what you crafted there!
It is all in the negative look-behind. RegEx is powerful, for sure.
Will you do this with inline search or on ingestion?
I will setup that in transforms.conf and props.conf
See if this works
[^@]+@(?!internaldomain)(?<e>.*)