Splunk Search

Anonymize only some Email Addresses

SirHill17
Communicator

Hi,

I need help writing a regex which must anonymize email address which doesn't below to the company domain. I already did some tests but with no success. Please find below the regex I tried:

^(.*)(?:(?<!\S)(\w[\w\-\.]+@domain.com))(.*)$

So the aim of the regex is to have all the email addresses external of the company anonymized and keep in clear text the internal email addresses.

Can someone help with that?

Thanks!!!

0 Karma
1 Solution

woodcock
Esteemed Legend

This pages walks you through everything except building the RegEx:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles

Try this for a RegEx:

[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)

This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com.

View solution in original post

woodcock
Esteemed Legend

This pages walks you through everything except building the RegEx:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Anonymizedatausingconfigurationfiles

Try this for a RegEx:

[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@YourCompanyDomainHere.com)(?:[^A-z]|$)

This will match all email addresses EXCEPT for those matching anyuser@YourCompanyDomainHere.com.

SirHill17
Communicator

Works fine!! Awesome!

Would you know how to do to use it for multiple domain names?

I tried with two SECCMD but doesn't work as the second one overwrites the first one.

0 Karma

jkat54
SplunkTrust
SplunkTrust

The answer I gave would work fine with multiple domains but I like woodcocks regex. So try adding an or "|"

SirHill17
Communicator

Thanks guys!

0 Karma

woodcock
Esteemed Legend

Like this:

 [A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,63}(?<!@(?:YourCompanyDomainHere.com|OtherCompany.org|AndOnAndOn.etc))(?:[^A-z]|$)

jkat54
SplunkTrust
SplunkTrust

Since you'll use props, i recommend this multi step approach:

1st change the format of the company domain, 2nd redact every email address that matches standard email format, then change company domain back to correct format:

[sourcetype]
SEDCMD-aaa=s/@company\.org/ at company.org/g
SEDCMD-bbb=s/(\w+)@(\w+)/****@\2/g
SEDCMD-ccc=s/ at /@/g

You'll note my regex in SEDCMD-bbb is not perfect for email matching, but puts you on the right track.

alt text

SirHill17
Communicator

I also tested your solution, works fine as well. Thanks!

0 Karma

jkat54
SplunkTrust
SplunkTrust

I gave regex hell and couldn't make it work for nothing. Woodcocks regex works in my limited testing but I was also able to find an "all inclusive" email address detecting regex that was over 4000 characters long. So i don't believe there is a perfect one-liner regex for this problem & would recommend the multi step approach instead.

0 Karma

woodcock
Esteemed Legend

Yes, RegEx for email is complicated.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I'm still amazed with what you crafted there!

0 Karma

woodcock
Esteemed Legend

It is all in the negative look-behind. RegEx is powerful, for sure.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Will you do this with inline search or on ingestion?

0 Karma

SirHill17
Communicator

I will setup that in transforms.conf and props.conf

0 Karma

sundareshr
Legend

See if this works

[^@]+@(?!internaldomain)(?<e>.*)
0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...