Splunk Search

How to achieve hiding sensitive data?

poojithavasanth
Explorer

Hello,

I have a log that look like this:

Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)

0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||


The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.

 transforms.conf I have.

[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])

On my props.conf I have

REPORT-removedata= removedata

But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?

Thank you,

Labels (2)
0 Karma

poojithavasanth
Explorer

Thanks for the reply @gcusello 

Here in this case, event_name would be any random events.

For example, it could be viewed patient data in registration, view patient, search analyser etc.

Here event_name should accept any string and I would want a regex to srub the data within () after event_name.

viewed patient data in registration(XXTEST, ORANGE CRUSH)

view patient(XXTEST)

search analyser (YYTEST, TEST)

Could you please help me on Regex for the same.

 

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.

Ciao.

Giuseppe

0 Karma

poojithavasanth
Explorer

The first answer replaces (XXX) or any string I would want. I accept.

However, I would have random characters before the sensitive data and not a specified character.

For example:

Exmaple 1)

0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

Example 2)

1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||

Example 3)

1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||

a) Fields are separated using | (pipe)

b) The string before (XXX) would be named as event_name. 

c) Should we use captured group for building a regex?

Let me know if you would need any additional info.

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:

Substitute characters in events with a sed script in props.conf:

[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g

or 

in props.conf 

[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer

and in transforms.conf

[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw

I usually use SEDCMD.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...