Splunk Search

Help to modify existing regex to mask senstive PII?

smakwana
Engager

Hi Splunkers,

I am looking for some help in modifying current regex to meet our updated project criteria.

Link: https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Anonymizedata

Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
LogEvent="Response",MethodName="get.complete",ActionResult="Success",ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="jhon",LastName="doe",Gender="M",DateOfBirth="7/19/1993",SocialSecurityNumber="123456789",MaritalStatus="0",RaceInformation="Item8",CitizenshipCode="1",County="20",AddressLine1="221 Street",City="Washington",State="USA" 

I want to write a regular expression to mask all key value pairs basically PII data which start after ,MethodName="get.complete", (i.e ApplicationNumber, FirstName, DateOfBirth, SocialSecurityNumber, MaritalStatus ,etc)

Order of the field till Method name is constant and is never changing. Every event would have exact order till “MethodName” and additional PII elements added after the “MethodName”.

Note: The location of the fields to masked may change at time but it will always be in a key value pair format. (i.e ,ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Sherlock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976")

Following are the solution I was planning to use to mask data at index time.

PROPS Example Using SEDCMD Regex:

[sourcetype]
**SEDCMD-mask = regex to skip first three key-value pair and mask rest

OR**

Transforms Example Using regex:

[ssn-anonymizer]
REGEX = regex to capture ssn
FORMAT = format to mask entire data
DEST_KEY = _raw

Current approaches not fulfilling our request.
1 Below expression is dropping all values after MethodName instead of masking them.

SEDCMD-maskPHI = s/(MethodName=\"[^\"]+\",).*$/\1/g 

2 Below regex is masking all key value pairs after the last |. But we need to mask everything only after the MethodName="get.complete".

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g 

Thank you for all of your help and advice.

[Edit: fixed formatting and used the code button so characters no longer are being eaten.]

1 Solution

harsmarvania57
Ultra Champion

Hi @smakwana,

If you would like to use props.conf and transforms.conf then please use below configuration on Indexer/Heavy Forwarder whichever comes first. You can test below regex with your sample data here https://regex101.com/r/F6zv8u/1

props.conf

[yoursourcetype]
TRANSFORMS-anonymize = PII-anonymizer

transforms.conf

    [PII-anonymizer]
     REGEX = (?m)^(.*MethodName=\"get\.complete\").*(.*)$
     FORMAT = $1#######$2
     DEST_KEY = _raw

EDIT1: Updated transforms.conf configuration.
EDIT2: If you want to you sed then you can use below regex

\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"

So your SED configuration will be

SEDCMD-maskall = s/\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"/\1="########"/g

For testing purpose I have made below query based on your data

| makeresults
| eval _raw="Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent=\"Response\",MethodName=\"get.complete\",ActionResult=\"Success\",ApplicationNumber=\"1234567890\",ApplicationLanguage=\"1\",Section=\"SUMMARY\",FirstName=\"jhon\",LastName=\"doe\",Gender=\"M\",DateOfBirth=\"7/19/1993\",SocialSecurityNumber=\"123456789\",MaritalStatus=\"0\",RaceInformation=\"Item8\",CitizenshipCode=\"1\",County=\"20\",AddressLine1=\"221 Street\",City=\"Washington\",State=\"USA\""
 | rex mode=sed "s/\b(?:(?!LogEvent|MethodName)(\w+))\b=\"(?:(?:.)*?)\"/\1="########"/g"

Which is giving below result

Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent="Response",MethodName="get.complete",ActionResult=########,ApplicationNumber=########,ApplicationLanguage=########,Section=########,FirstName=########,LastName=########,Gender=########,DateOfBirth=########,SocialSecurityNumber=########,MaritalStatus=########,RaceInformation=########,CitizenshipCode=########,County=########,AddressLine1=########,City=########,State=########

View solution in original post

harsmarvania57
Ultra Champion

Hi @smakwana,

If you would like to use props.conf and transforms.conf then please use below configuration on Indexer/Heavy Forwarder whichever comes first. You can test below regex with your sample data here https://regex101.com/r/F6zv8u/1

props.conf

[yoursourcetype]
TRANSFORMS-anonymize = PII-anonymizer

transforms.conf

    [PII-anonymizer]
     REGEX = (?m)^(.*MethodName=\"get\.complete\").*(.*)$
     FORMAT = $1#######$2
     DEST_KEY = _raw

EDIT1: Updated transforms.conf configuration.
EDIT2: If you want to you sed then you can use below regex

\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"

So your SED configuration will be

SEDCMD-maskall = s/\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"/\1="########"/g

For testing purpose I have made below query based on your data

| makeresults
| eval _raw="Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent=\"Response\",MethodName=\"get.complete\",ActionResult=\"Success\",ApplicationNumber=\"1234567890\",ApplicationLanguage=\"1\",Section=\"SUMMARY\",FirstName=\"jhon\",LastName=\"doe\",Gender=\"M\",DateOfBirth=\"7/19/1993\",SocialSecurityNumber=\"123456789\",MaritalStatus=\"0\",RaceInformation=\"Item8\",CitizenshipCode=\"1\",County=\"20\",AddressLine1=\"221 Street\",City=\"Washington\",State=\"USA\""
 | rex mode=sed "s/\b(?:(?!LogEvent|MethodName)(\w+))\b=\"(?:(?:.)*?)\"/\1="########"/g"

Which is giving below result

Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP| 
 LogEvent="Response",MethodName="get.complete",ActionResult=########,ApplicationNumber=########,ApplicationLanguage=########,Section=########,FirstName=########,LastName=########,Gender=########,DateOfBirth=########,SocialSecurityNumber=########,MaritalStatus=########,RaceInformation=########,CitizenshipCode=########,County=########,AddressLine1=########,City=########,State=########

harsmarvania57
Ultra Champion

In given solution transforms.conf example mask everything after MethodName="get.complete", so please use SED option which works perfectly fine irrespective of location of fields ApplicationNumber, FirstName ..... etc.

0 Karma

smakwana
Engager

@harsmarvania57..thank you so much. It resolved our issue.

0 Karma

harsmarvania57
Ultra Champion

Feel free to upvote my answer if it really helps. 😛

0 Karma

nishitdarade
Explorer

@harsmarvania57 I had the same issue and this solved it. Thank You. 🙂

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...