Our code leaked SSNs into our logs and they went into Splunk, so i'm trying to mask it. I tried it two ways (BTW, the regex works when i use it with | regex _raw=
😞
etc/system/local/props.conf
:[source::/var/www/app/shared/log/production.log]
SEDCMD-ssn = s/(social_security_number..:..)\d{9}/\1[FILTERED]/g
etc/system/local/props.conf
:[source::/var/www/app/shared/log/production.log]
TRANSFORMS-ssn = ssn_mask
and etc/system/local/transforms.conf
:
[ssn_mask]
DEST_KEY = _raw
REGEX = (social_security_number..:..)\d{9}
FORMAT = $1[FILTERED]
Neither works. What am I missing? This is on 6.5.0.
The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.
then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.
The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...
The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.
then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.
The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...
This is the relevant part of the JSON blob:
{"params":{"{\"applicants\":{\"primary\":{\"social_security_number\":\"SSNNUMBER\"}}}":"[FILTERED]"}}
SSNNUMBER
is a 9-digit number.
Based on sample data, your SEDCMD setting should be adjusted a little. Below is the modified version. Give it a try...
SEDCMD-ssn = s/(social_security_number..:..)\d{9}(\\")/\1xxxxxxxxx\2/g
Please note that data can't be modified once indexed. This mask will be effected to new events.
You can find some information here: https://answers.splunk.com/answers/22835/how-can-we-anonymize-user-date-at-search-time.html
Thanks, everyone.
The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.
Data that already ingested, can't be modified. Your masking configuration will only work on any new event that would come. I believe your only option would be to delete those events, so that they are not searchable anymore. If you still want other fields/data from those events, you can mask the data at search time (inline in search) and do summary indexing to save those records into different index before deleting them.
Hi ronerf,
Your configurations looks good. Can you provide sample event(s) to see why these configurations doesn't work. Also, please remember that these configurations should be applied to both source and destination of the data, which means in a typical deployment, configs should be present on universal forwarders, heavy forwarders (if you're using this) and indexers.
yes, please provide a few samples of sanitized SSN and the event around.
Also the transforms are happening at index time, therefore they have to be setup
on the first server parsing the events.