Splunk Search

Help in creating regex for encryption of data?

Explorer

Hi Splunkers,

I am looking for some help in creation of regular expression to Anonymize data with a regular expression in a transforms.

Link: https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Anonymizedata

Current Log format: Timestamp | Category | Machine | ApplicationDomain | ProcessId | ProcessName | ThreadId | LogID | UserName | ActionName | Module | AuthorizationStatus | RequestedBy | RequestingURL | QueryString | HTTPVerb | ClientIP| LogEvent="Response",MethodName="get",ActionResult="Success",ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Shrelock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976",SocialSecurityNumber="123456789",MaritalStatus="0",RaceInformation="Item8",CitizenshipCode="1",County="20",AddressLine1="221 Baker Street",City="Marylebone",State="London"

I want to write regular expression for all key value pairs after which start after "ClientIP|". (i.e LogEvent, MethodName, ApplicationNumber, FirstName, DateOfBirth, SocialSecurityNumber, etc)

Note: The location of the fields may change at time but it will always be in a key value pair format. (i.e ,ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Sherlock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976")

Transforms Example:
[ssn-anonymizer]
REGEX = regex to capture ssn
FORMAT = format to mask entire data
DEST_KEY = _raw

I would really appreciate all the help the community can give.

Thank You,
Nish.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

Provided there's no pipe characters present in the key-value pair data, there's a way to do this using SEDCMD. The approach is to look for key-value pairs that have no pipes after them on the current line, then replace those key value pairs with masked versions. Unfortunately it's a fairly heavy regex though, so just be aware of possible performance issues.

In your input's props.conf stanza you should put:

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g

This will replace the values with eight hashes, and only for the values after the last pipe character. In the example below, only the last three values here would match (value4, value5 and value6), as they're the only key-value pairs after the last pipe:

BEFORE:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="value4",key5="value5" , key6="value6"

AFTER:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="########",key5="########" , key6="########"

View solution in original post

Splunk Employee
Splunk Employee

Provided there's no pipe characters present in the key-value pair data, there's a way to do this using SEDCMD. The approach is to look for key-value pairs that have no pipes after them on the current line, then replace those key value pairs with masked versions. Unfortunately it's a fairly heavy regex though, so just be aware of possible performance issues.

In your input's props.conf stanza you should put:

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g

This will replace the values with eight hashes, and only for the values after the last pipe character. In the example below, only the last three values here would match (value4, value5 and value6), as they're the only key-value pairs after the last pipe:

BEFORE:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="value4",key5="value5" , key6="value6"

AFTER:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="########",key5="########" , key6="########"

View solution in original post

Explorer

Thank You for your answer. I will try to implement this approach and will let the group know on the progress. And yes i want to mask all the key value after the pipe. Will this mask data at index time? or at the presentation layer. I am assuming i have to update the props on the TA i created to on-board data.

0 Karma

Splunk Employee
Splunk Employee

This will mask it at index time, and yes, the local folder of your TA would be the right place to modify props.conf.

0 Karma

Revered Legend

Give this a try

props.conf (indexer or heavy forwarder whichever comes first)

    [yourSourceTypeHere]
    ..other settings..
    SEDCMD-maskkvs = s/(\w+)=\"[^\"]+\"/\1/g

Explorer

Thank You for your answer. I will get back to you when i try this approach.

0 Karma

Revered Legend

So you want to mask all the key value pairs which comes after ClientIP OR want to retain them and mask all remaining?

0 Karma

Explorer

I want to mask all the key value pair after ClientIP. I am sorry i didnt get second part of your question.
(OR want to retain them and mask all remaining?)

0 Karma

SplunkTrust
SplunkTrust

If you want to mask the data in all the fields after ClientIP, why not just remove all that data completely from the end of the events? That will save processing and licensing costs. If they won't necessarily be after ClientIP, then that is a different problem, but if all that data is anonymized, there seems to be little reason to even include it in the data that you are indexing.

0 Karma

Explorer

Thank You for the reply @cpetterborg but the approach is based on our requirement and it require's masking of the data after ClientIP.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!