How to achieve hiding sensitive data?

poojithavasanth · ‎01-17-2023

Hello,

I have a log that look like this:

Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)

0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.

transforms.conf I have.

[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])

On my props.conf I have

REPORT-removedata= removedata

But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?

Thank you,

poojithavasanth · ‎01-17-2023

Thanks for the reply @gcusello

Here in this case, event_name would be any random events.

For example, it could be viewed patient data in registration, view patient, search analyser etc.

Here event_name should accept any string and I would want a regex to srub the data within () after event_name.

viewed patient data in registration(XXTEST, ORANGE CRUSH)

view patient(XXTEST)

search analyser (YYTEST, TEST)

Could you please help me on Regex for the same.

Thank you,

gcusello · ‎01-17-2023

Hi @poojithavasanth,

if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.

Ciao.

Giuseppe

poojithavasanth · ‎01-17-2023

The first answer replaces (XXX) or any string I would want. I accept.

However, I would have random characters before the sensitive data and not a specified character.

For example:

Exmaple 1)

0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

Example 2)

1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||

Example 3)

1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||

a) Fields are separated using | (pipe)

b) The string before (XXX) would be named as event_name.

c) Should we use captured group for building a regex?

Let me know if you would need any additional info.

Thank you,

gcusello · ‎01-17-2023

Hi @poojithavasanth,

following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:

Substitute characters in events with a sed script in props.conf:

[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g

or

in props.conf

[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer

and in transforms.conf

[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw

I usually use SEDCMD.

Ciao.

Giuseppe

How to achieve hiding sensitive data?

fields

regex

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)