Splunk Search

How to achieve hiding sensitive data?

poojithavasanth
Explorer

Hello,

I have a log that look like this:

Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)

0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||


The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.

 transforms.conf I have.

[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])

On my props.conf I have

REPORT-removedata= removedata

But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?

Thank you,

Labels (2)
0 Karma

poojithavasanth
Explorer

Thanks for the reply @gcusello 

Here in this case, event_name would be any random events.

For example, it could be viewed patient data in registration, view patient, search analyser etc.

Here event_name should accept any string and I would want a regex to srub the data within () after event_name.

viewed patient data in registration(XXTEST, ORANGE CRUSH)

view patient(XXTEST)

search analyser (YYTEST, TEST)

Could you please help me on Regex for the same.

 

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.

Ciao.

Giuseppe

0 Karma

poojithavasanth
Explorer

The first answer replaces (XXX) or any string I would want. I accept.

However, I would have random characters before the sensitive data and not a specified character.

For example:

Exmaple 1)

0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

Example 2)

1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||

Example 3)

1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||

a) Fields are separated using | (pipe)

b) The string before (XXX) would be named as event_name. 

c) Should we use captured group for building a regex?

Let me know if you would need any additional info.

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:

Substitute characters in events with a sed script in props.conf:

[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g

or 

in props.conf 

[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer

and in transforms.conf

[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw

I usually use SEDCMD.

Ciao.

Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...