Splunk Search

How to achieve hiding sensitive data?

poojithavasanth
Explorer

Hello,

I have a log that look like this:

Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)

0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||


The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.

 transforms.conf I have.

[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])

On my props.conf I have

REPORT-removedata= removedata

But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?

Thank you,

Labels (2)
0 Karma

poojithavasanth
Explorer

Thanks for the reply @gcusello 

Here in this case, event_name would be any random events.

For example, it could be viewed patient data in registration, view patient, search analyser etc.

Here event_name should accept any string and I would want a regex to srub the data within () after event_name.

viewed patient data in registration(XXTEST, ORANGE CRUSH)

view patient(XXTEST)

search analyser (YYTEST, TEST)

Could you please help me on Regex for the same.

 

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.

Ciao.

Giuseppe

0 Karma

poojithavasanth
Explorer

The first answer replaces (XXX) or any string I would want. I accept.

However, I would have random characters before the sensitive data and not a specified character.

For example:

Exmaple 1)

0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||

Example 2)

1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||

Example 3)

1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||

a) Fields are separated using | (pipe)

b) The string before (XXX) would be named as event_name. 

c) Should we use captured group for building a regex?

Let me know if you would need any additional info.

Thank you,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @poojithavasanth,

following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:

Substitute characters in events with a sed script in props.conf:

[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g

or 

in props.conf 

[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer

and in transforms.conf

[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw

I usually use SEDCMD.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...