Getting Data In

How to mask SSN into our logs going into Splunk?

ronerf
Explorer

Our code leaked SSNs into our logs and they went into Splunk, so i'm trying to mask it. I tried it two ways (BTW, the regex works when i use it with | regex _raw=😞

  1. In etc/system/local/props.conf:

[source::/var/www/app/shared/log/production.log]
SEDCMD-ssn = s/(social_security_number..:..)\d{9}/\1[FILTERED]/g

  1. In etc/system/local/props.conf:

[source::/var/www/app/shared/log/production.log]
TRANSFORMS-ssn = ssn_mask

and etc/system/local/transforms.conf:

[ssn_mask]
DEST_KEY = _raw
REGEX = (social_security_number..:..)\d{9}
FORMAT = $1[FILTERED]

Neither works. What am I missing? This is on 6.5.0.

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.

The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...

View solution in original post

yannK
Splunk Employee
Splunk Employee

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.

The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...

ronerf
Explorer

This is the relevant part of the JSON blob:

{"params":{"{\"applicants\":{\"primary\":{\"social_security_number\":\"SSNNUMBER\"}}}":"[FILTERED]"}}

SSNNUMBERis a 9-digit number.

0 Karma

sudosplunk
Motivator

Based on sample data, your SEDCMD setting should be adjusted a little. Below is the modified version. Give it a try...
SEDCMD-ssn = s/(social_security_number..:..)\d{9}(\\")/\1xxxxxxxxx\2/g

Please note that data can't be modified once indexed. This mask will be effected to new events.

You can find some information here: https://answers.splunk.com/answers/22835/how-can-we-anonymize-user-date-at-search-time.html

0 Karma

ronerf
Explorer

Thanks, everyone.

0 Karma

ronerf
Explorer

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

0 Karma

somesoni2
Revered Legend

Data that already ingested, can't be modified. Your masking configuration will only work on any new event that would come. I believe your only option would be to delete those events, so that they are not searchable anymore. If you still want other fields/data from those events, you can mask the data at search time (inline in search) and do summary indexing to save those records into different index before deleting them.

0 Karma

sudosplunk
Motivator

Hi ronerf,

Your configurations looks good. Can you provide sample event(s) to see why these configurations doesn't work. Also, please remember that these configurations should be applied to both source and destination of the data, which means in a typical deployment, configs should be present on universal forwarders, heavy forwarders (if you're using this) and indexers.

0 Karma

yannK
Splunk Employee
Splunk Employee

yes, please provide a few samples of sanitized SSN and the event around.

Also the transforms are happening at index time, therefore they have to be setup
on the first server parsing the events.

  • for regular logs, this means the indexers, or the first heavy forwarder of the chain (if any)
  • for structured logs (INDEXED_EXTRACTIONS=csv or json,,,), this means on the first forwarder who collected the logs (this may be the Universal forwarder)
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...