Getting Data In

How to mask SSN into our logs going into Splunk?

ronerf
Explorer

Our code leaked SSNs into our logs and they went into Splunk, so i'm trying to mask it. I tried it two ways (BTW, the regex works when i use it with | regex _raw=😞

  1. In etc/system/local/props.conf:

[source::/var/www/app/shared/log/production.log]
SEDCMD-ssn = s/(social_security_number..:..)\d{9}/\1[FILTERED]/g

  1. In etc/system/local/props.conf:

[source::/var/www/app/shared/log/production.log]
TRANSFORMS-ssn = ssn_mask

and etc/system/local/transforms.conf:

[ssn_mask]
DEST_KEY = _raw
REGEX = (social_security_number..:..)\d{9}
FORMAT = $1[FILTERED]

Neither works. What am I missing? This is on 6.5.0.

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.

The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...

View solution in original post

yannK
Splunk Employee
Splunk Employee

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

then no, you cannot safely hide the SSN from the events at search time, as they are in the raw data.

The solution is to create a search that will find all the events with SSN, and use the " | delete" command to mark them as delete on the buckets. (may be more tricky on an indexer cluster)
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/RemovedatafromSplunk#Delete_events_from_s...

ronerf
Explorer

This is the relevant part of the JSON blob:

{"params":{"{\"applicants\":{\"primary\":{\"social_security_number\":\"SSNNUMBER\"}}}":"[FILTERED]"}}

SSNNUMBERis a 9-digit number.

0 Karma

sudosplunk
Motivator

Based on sample data, your SEDCMD setting should be adjusted a little. Below is the modified version. Give it a try...
SEDCMD-ssn = s/(social_security_number..:..)\d{9}(\\")/\1xxxxxxxxx\2/g

Please note that data can't be modified once indexed. This mask will be effected to new events.

You can find some information here: https://answers.splunk.com/answers/22835/how-can-we-anonymize-user-date-at-search-time.html

0 Karma

ronerf
Explorer

Thanks, everyone.

0 Karma

ronerf
Explorer

The code that generates the logs has been corrected to filter the SSNs, so the goal is to mask the logs that have already been indexed in splunk.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Data that already ingested, can't be modified. Your masking configuration will only work on any new event that would come. I believe your only option would be to delete those events, so that they are not searchable anymore. If you still want other fields/data from those events, you can mask the data at search time (inline in search) and do summary indexing to save those records into different index before deleting them.

0 Karma

sudosplunk
Motivator

Hi ronerf,

Your configurations looks good. Can you provide sample event(s) to see why these configurations doesn't work. Also, please remember that these configurations should be applied to both source and destination of the data, which means in a typical deployment, configs should be present on universal forwarders, heavy forwarders (if you're using this) and indexers.

0 Karma

yannK
Splunk Employee
Splunk Employee

yes, please provide a few samples of sanitized SSN and the event around.

Also the transforms are happening at index time, therefore they have to be setup
on the first server parsing the events.

  • for regular logs, this means the indexers, or the first heavy forwarder of the chain (if any)
  • for structured logs (INDEXED_EXTRACTIONS=csv or json,,,), this means on the first forwarder who collected the logs (this may be the Universal forwarder)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...