Splunk Search
Highlighted

Help to modify existing regex to mask senstive PII?

Engager

Hi Splunkers,

I am looking for some help in modifying current regex to meet our updated project criteria.

Link: https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Anonymizedata

Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
LogEvent="Response",MethodName="get.complete",ActionResult="Success",ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="jhon",LastName="doe",Gender="M",DateOfBirth="7/19/1993",SocialSecurityNumber="123456789",MaritalStatus="0",RaceInformation="Item8",CitizenshipCode="1",County="20",AddressLine1="221 Street",City="Washington",State="USA" 

I want to write a regular expression to mask all key value pairs basically PII data which start after ,MethodName="get.complete", (i.e ApplicationNumber, FirstName, DateOfBirth, SocialSecurityNumber, MaritalStatus ,etc)

Order of the field till Method name is constant and is never changing. Every event would have exact order till “MethodName” and additional PII elements added after the “MethodName”.

Note: The location of the fields to masked may change at time but it will always be in a key value pair format. (i.e ,ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Sherlock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976")

Following are the solution I was planning to use to mask data at index time.

PROPS Example Using SEDCMD Regex:

[sourcetype]
**SEDCMD-mask = regex to skip first three key-value pair and mask rest

OR**

Transforms Example Using regex:

[ssn-anonymizer]
REGEX = regex to capture ssn
FORMAT = format to mask entire data
DEST_KEY = _raw

Current approaches not fulfilling our request.
1 Below expression is dropping all values after MethodName instead of masking them.

SEDCMD-maskPHI = s/(MethodName=\"[^\"]+\",).*$/\1/g 

2 Below regex is masking all key value pairs after the last |. But we need to mask everything only after the MethodName="get.complete".

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g 

Thank you for all of your help and advice.

[Edit: fixed formatting and used the code button so characters no longer are being eaten.]

Highlighted

Re: Help to modify existing regex to mask senstive PII?

SplunkTrust
SplunkTrust

Hi @smakwana,

If you would like to use props.conf and transforms.conf then please use below configuration on Indexer/Heavy Forwarder whichever comes first. You can test below regex with your sample data here https://regex101.com/r/F6zv8u/1

props.conf

[yoursourcetype]
TRANSFORMS-anonymize = PII-anonymizer

transforms.conf

    [PII-anonymizer]
     REGEX = (?m)^(.*MethodName=\"get\.complete\").*(.*)$
     FORMAT = $1#######$2
     DEST_KEY = _raw

EDIT1: Updated transforms.conf configuration.
EDIT2: If you want to you sed then you can use below regex

\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"

So your SED configuration will be

SEDCMD-maskall = s/\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"/\1="########"/g

For testing purpose I have made below query based on your data

| makeresults
| eval _raw="Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent=\"Response\",MethodName=\"get.complete\",ActionResult=\"Success\",ApplicationNumber=\"1234567890\",ApplicationLanguage=\"1\",Section=\"SUMMARY\",FirstName=\"jhon\",LastName=\"doe\",Gender=\"M\",DateOfBirth=\"7/19/1993\",SocialSecurityNumber=\"123456789\",MaritalStatus=\"0\",RaceInformation=\"Item8\",CitizenshipCode=\"1\",County=\"20\",AddressLine1=\"221 Street\",City=\"Washington\",State=\"USA\""
 | rex mode=sed "s/\b(?:(?!LogEvent|MethodName)(\w+))\b=\"(?:(?:.)*?)\"/\1="########"/g"

Which is giving below result

Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent="Response",MethodName="get.complete",ActionResult=########,ApplicationNumber=########,ApplicationLanguage=########,Section=########,FirstName=########,LastName=########,Gender=########,DateOfBirth=########,SocialSecurityNumber=########,MaritalStatus=########,RaceInformation=########,CitizenshipCode=########,County=########,AddressLine1=########,City=########,State=########

View solution in original post

Highlighted

Re: Help to modify existing regex to mask senstive PII?

SplunkTrust
SplunkTrust

In given solution transforms.conf example mask everything after MethodName="get.complete", so please use SED option which works perfectly fine irrespective of location of fields ApplicationNumber, FirstName ..... etc.

0 Karma
Highlighted

Re: Help to modify existing regex to mask senstive PII?

Explorer

@harsmarvania57 I had the same issue and this solved it. Thank You. 🙂

0 Karma
Highlighted

Re: Help to modify existing regex to mask senstive PII?

Engager

@harsmarvania57..thank you so much. It resolved our issue.

0 Karma
Highlighted

Re: Help to modify existing regex to mask senstive PII?

SplunkTrust
SplunkTrust

Feel free to upvote my answer if it really helps. 😛

0 Karma