Splunk Search

Anonymize only some Email Addresses

Communicator

Hi,

I need help writing a regex which must anonymize email address which doesn't below to the company domain. I already did some tests but with no success. Please find below the regex I tried:

^(.*)(?:(?<!\S)(\w[\w\-\.]+@domain.com))(.*)$

So the aim of the regex is to have all the email addresses external of the company anonymized and keep in clear text the internal email addresses.

Can someone help with that?

Thanks!!!

0 Karma
1 Solution

Esteemed Legend

This pages walks you through everything except building the RegEx:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles

Try this for a RegEx:

[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)

This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com.

View solution in original post

Esteemed Legend

This pages walks you through everything except building the RegEx:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles

Try this for a RegEx:

[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)

This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com.

View solution in original post

Communicator

Works fine!! Awesome!

Would you know how to do to use it for multiple domain names?

I tried with two SECCMD but doesn't work as the second one overwrites the first one.

0 Karma

SplunkTrust
SplunkTrust

The answer I gave would work fine with multiple domains but I like woodcocks regex. So try adding an or "|"

Communicator

Thanks guys!

0 Karma

Esteemed Legend

Like this:

 [A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@(?:YourCompanyDomainHere.com|OtherCompany.org|AndOnAndOn.etc))(?:[^A-z]|$)

SplunkTrust
SplunkTrust

Since you'll use props, i recommend this multi step approach:

1st change the format of the company domain, 2nd redact every email address that matches standard email format, then change company domain back to correct format:

[sourcetype]
SEDCMD-aaa=s/@company\.org/ at company.org/g
SEDCMD-bbb=s/(\w+)@(\w+)/****@\2/g
SEDCMD-ccc=s/ at /@/g

You'll note my regex in SEDCMD-bbb is not perfect for email matching, but puts you on the right track.

alt text

Communicator

I also tested your solution, works fine as well. Thanks!

0 Karma

SplunkTrust
SplunkTrust

I gave regex hell and couldn't make it work for nothing. Woodcocks regex works in my limited testing but I was also able to find an "all inclusive" email address detecting regex that was over 4000 characters long. So i don't believe there is a perfect one-liner regex for this problem & would recommend the multi step approach instead.

0 Karma

Esteemed Legend

Yes, RegEx for email is complicated.

0 Karma

SplunkTrust
SplunkTrust

I'm still amazed with what you crafted there!

0 Karma

Esteemed Legend

It is all in the negative look-behind. RegEx is powerful, for sure.

0 Karma

SplunkTrust
SplunkTrust

Will you do this with inline search or on ingestion?

0 Karma

Communicator

I will setup that in transforms.conf and props.conf

0 Karma

Legend

See if this works

[^@]+@(?!internaldomain)(?<e>.*)
0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!