Splunk Search

Help in creating regex for encryption of data?

nishitdarade
Explorer

Hi Splunkers,

I am looking for some help in creation of regular expression to Anonymize data with a regular expression in a transforms.

Link: https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Anonymizedata

Current Log format: Timestamp | Category | Machine | ApplicationDomain | ProcessId | ProcessName | ThreadId | LogID | UserName | ActionName | Module | AuthorizationStatus | RequestedBy | RequestingURL | QueryString | HTTPVerb | ClientIP| LogEvent="Response",MethodName="get",ActionResult="Success",ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Shrelock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976",SocialSecurityNumber="123456789",MaritalStatus="0",RaceInformation="Item8",CitizenshipCode="1",County="20",AddressLine1="221 Baker Street",City="Marylebone",State="London"

I want to write regular expression for all key value pairs after which start after "ClientIP|". (i.e LogEvent, MethodName, ApplicationNumber, FirstName, DateOfBirth, SocialSecurityNumber, etc)

Note: The location of the fields may change at time but it will always be in a key value pair format. (i.e ,ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Sherlock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976")

Transforms Example:
[ssn-anonymizer]
REGEX = regex to capture ssn
FORMAT = format to mask entire data
DEST_KEY = _raw

I would really appreciate all the help the community can give.

Thank You,
Nish.

0 Karma
1 Solution

mtulett_splunk
Splunk Employee
Splunk Employee

Provided there's no pipe characters present in the key-value pair data, there's a way to do this using SEDCMD. The approach is to look for key-value pairs that have no pipes after them on the current line, then replace those key value pairs with masked versions. Unfortunately it's a fairly heavy regex though, so just be aware of possible performance issues.

In your input's props.conf stanza you should put:

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g

This will replace the values with eight hashes, and only for the values after the last pipe character. In the example below, only the last three values here would match (value4, value5 and value6), as they're the only key-value pairs after the last pipe:

BEFORE:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="value4",key5="value5" , key6="value6"

AFTER:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="########",key5="########" , key6="########"

View solution in original post

mtulett_splunk
Splunk Employee
Splunk Employee

Provided there's no pipe characters present in the key-value pair data, there's a way to do this using SEDCMD. The approach is to look for key-value pairs that have no pipes after them on the current line, then replace those key value pairs with masked versions. Unfortunately it's a fairly heavy regex though, so just be aware of possible performance issues.

In your input's props.conf stanza you should put:

SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g

This will replace the values with eight hashes, and only for the values after the last pipe character. In the example below, only the last three values here would match (value4, value5 and value6), as they're the only key-value pairs after the last pipe:

BEFORE:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="value4",key5="value5" , key6="value6"

AFTER:
MyEvent | GET | key2="value2",key3="value3" | 1.2.3.4 | key4="########",key5="########" , key6="########"

nishitdarade
Explorer

Thank You for your answer. I will try to implement this approach and will let the group know on the progress. And yes i want to mask all the key value after the pipe. Will this mask data at index time? or at the presentation layer. I am assuming i have to update the props on the TA i created to on-board data.

0 Karma

mtulett_splunk
Splunk Employee
Splunk Employee

This will mask it at index time, and yes, the local folder of your TA would be the right place to modify props.conf.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

props.conf (indexer or heavy forwarder whichever comes first)

    [yourSourceTypeHere]
    ..other settings..
    SEDCMD-maskkvs = s/(\w+)=\"[^\"]+\"/\1/g

nishitdarade
Explorer

Thank You for your answer. I will get back to you when i try this approach.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

So you want to mask all the key value pairs which comes after ClientIP OR want to retain them and mask all remaining?

0 Karma

nishitdarade
Explorer

I want to mask all the key value pair after ClientIP. I am sorry i didnt get second part of your question.
(OR want to retain them and mask all remaining?)

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you want to mask the data in all the fields after ClientIP, why not just remove all that data completely from the end of the events? That will save processing and licensing costs. If they won't necessarily be after ClientIP, then that is a different problem, but if all that data is anonymized, there seems to be little reason to even include it in the data that you are indexing.

0 Karma

nishitdarade
Explorer

Thank You for the reply @cpetterborg but the approach is based on our requirement and it require's masking of the data after ClientIP.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...