Splunk Search

How to achieve hiding sensitive data?

poojithavasanth
Explorer

Hello,

I have a log that look like this:

Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)

0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||


The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.

 transforms.conf I have.

[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])

On my props.conf I have

REPORT-removedata= removedata

But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?

Thank you,

Labels (2)
0 Karma

poojithavasanth
Explorer

Thanks for the reply @gcusello 

Here in this case, event_name would be any random events.

For example, it could be viewed patient data in registration, view patient, search analyser etc.

Here event_name should accept any string and I would want a regex to srub the data within () after event_name.

viewed patient data in registration(XXTEST, ORANGE CRUSH)

view patient(XXTEST)

search analyser (YYTEST, TEST)

Could you please help me on Regex for the same.

 

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.

Ciao.

Giuseppe

0 Karma

poojithavasanth
Explorer

The first answer replaces (XXX) or any string I would want. I accept.

However, I would have random characters before the sensitive data and not a specified character.

For example:

Exmaple 1)

0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

Example 2)

1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||

Example 3)

1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||

a) Fields are separated using | (pipe)

b) The string before (XXX) would be named as event_name. 

c) Should we use captured group for building a regex?

Let me know if you would need any additional info.

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:

Substitute characters in events with a sed script in props.conf:

[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g

or 

in props.conf 

[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer

and in transforms.conf

[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw

I usually use SEDCMD.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...