Hello,
I have a log that look like this:
Here each fields as its own field name, and viewed patient data in registration(XXTEST, ORANGE CRUSH) here is event_name. (Captured group to be used.)
0000|2019-01-07T14:20:12.000000Z|patientid|lastname, firstname|personlastname|M|middelname||PIEIGHT||MRN||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||
The one in red should be removed as it is sensitive patient data, for example (XXTEST, ORANGE CRUSH) should be removed.
transforms.conf I have.
[removedata]
REGEX = ^(?:[^\|\n]|){13}(?P<event_name>[^\|]+)([^)])
On my props.conf I have
REPORT-removedata= removedata
But it is still not working: Do I need to use the field name, or change my regex? Am I applying the proper user of Transform?
Thank you,
Thanks for the reply @gcusello
Here in this case, event_name would be any random events.
For example, it could be viewed patient data in registration, view patient, search analyser etc.
Here event_name should accept any string and I would want a regex to srub the data within () after event_name.
viewed patient data in registration(XXTEST, ORANGE CRUSH)
view patient(XXTEST)
search analyser (YYTEST, TEST)
Could you please help me on Regex for the same.
Thank you,
Hi @poojithavasanth,
if you see my first answer, you can find the SEDCMD command with the regex to replace the sensitive data with "XXX" or another string you like.
Ciao.
Giuseppe
The first answer replaces (XXX) or any string I would want. I accept.
However, I would have random characters before the sensitive data and not a specified character.
For example:
Exmaple 1)
0000|2019-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||Viewed|viewed patient data in registration(XXTEST, ORANGE CRUSH)|00000||
Example 2)
1111|2010-01-07T14:20:12.000000Z|xxx|xxx, lastname|yyy|M|xxx||PIEIGHT||xxx||error|view patient(XXTEST)|00000||
Example 3)
1234|1999-01-07T14:20:12.000000Z|xxx|xxx, xxxx|xxx|M|xxx||PIEIGHT||xxx||notviewed|search analyser (YYTEST, TEST)|00000||
a) Fields are separated using | (pipe)
b) The string before (XXX) would be named as event_name.
c) Should we use captured group for building a regex?
Let me know if you would need any additional info.
Thank you,
Hi @poojithavasanth,
following the instructions at https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Anonymizedata you can:
Substitute characters in events with a sed script in props.conf:
[<your_sourcetype>]
SEDCMD-sensitive_data = s/viewed patient data in registration\([^\)]+\)/viewed patient data in registration\(xxx\)/g
or
in props.conf
[<your_sourcetype>]
TRANSFORMS-anonymize = anonymizer
and in transforms.conf
[anonymizer]
REGEX = viewed patient data in registration\(([^\)]+)\)
FORMAT = viewed patient data in registration\($1\)
DEST_KEY = _raw
I usually use SEDCMD.
Ciao.
Giuseppe