Getting Data In

How can we anonymize user date at search time?

herterich
Explorer

I want to anonymize user data (for example email adresses) at searchtime and tried a couple of ways.
I tried the rex command
rex mode=sed "s/(\w+.?\w+?)@mydoamin.net/xxxxx@mydomain.net/g"
which works but does not modify the raw event at search time. The result is, that if a user selects "show source" he can see the original mail address again. Also a defined field will show the original mail address.

The other problem is, that all the reports a boring because all our internal mail adresses will be replaced with xxxxx.

I'm looking for some way replace the username of the mail address with a hash code of the username. But could not find anything like this. I also saw in the splunkbase a solution to do a des or 3des encryption of a specific field (http://splunk-base.splunk.com/apps/22393/encrypt-and-decrypt-data-within-events) but this will not work in my environment because all events came in from forwarders or by syslog and on the forwarders I'm not allowed to install such functions because of performance issues.

In version 4.2 I found a new command mappy which allows to run short python scripts but looks like it does not support all python modules and options. I tried to use mappy and the python command re.sub but could not find any working "one line" command which will replace the string extracted by the rex with it's hash code.

Does anyone found a way to anonymize user data in splunk with hash codes or something like this.

Tags (1)

jrodman
Splunk Employee
Splunk Employee

Splunk does not have a feature to modify the value of _raw (the text of the event) at search time in a way that users cannot ever get access to the original value. You could try to, for convenience cases, create a calculated field _raw that replaces the original _raw, but that won't prevent users from being able to get the original value of _raw by doing things like overriding your rule.

If you need to anonymize data that users will see, you need to put an anonymized version of the data in an index they have access to, and not give them access to an index containing the non-anonymized data. Avoiding putting it in ANY index would do the job, as would putting it in an index that their role is not permitted access to.

In order to anonymize data at input time, you can use a traditional regex transform or a SEDCMD. If you want to create an anonymized version of the data at a later point, you can try to get summary indexing to do this for you -- producing modified data to go into an alternate index -- but it's somewhat fragile and I don't recommend it.

edaus
Engager

+1 @jrodman for the suggesting the trick to replace _raw. It does not mask the data in source data, however, it does help us to mask the data in search raw events output. For e.g. to mask the value of field "value", you can mask the data in the raw events and the extracted/selected fields which are the most visible views.

....
| eval _raw=replace(_raw,"(value=)[0-9]*","\1xxxx")
| eval value=replace(value,".*","xxxx")
...
0 Karma

dwaddle
SplunkTrust
SplunkTrust

Please understand and be aware - any technique you use to "mask" or "anonymize" data at search time is flawed and easily defeated. As long as users have access to be able to run an ad-hoc search, then they will be able to find a way around your attempts to anonymize data at search time.

When @jrodman said "you can't anonymize at search time", he meant that there's no way to make a search-time anonymizer that is robust enough to prevent people going around it.

Making a claim that you have "protected data" in this way is perhaps duplicitous. I would not recommend it.

0 Karma

jamesy281
Path Finder

Hi Jrodman,

Sorry I didn't mean to add it as an answer, I thought if I could resolve my issue it may help the original poster to anonymise before indexing so he wouldn't need to worry about doing it at search time.
I have already created a separate post but haven't had any responses which is why I posted here.
https://answers.splunk.com/answers/171932/how-to-anonymize-data-at-search-time.html

Sorry for posting this as an answer again but I don't see how to add as a comment.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Well I misunderstood, because you linked to information about how to anonymize data at input/index time. If you want to anonymize data at search time you already have the answer that you basically can't.

0 Karma

jamesy281
Path Finder

Oops sorry just spotted this comment option.

0 Karma

jamesy281
Path Finder

Jrodman,

Great response, I am new to splunk sd I,m not sure how to go about creating a new index for the purposes of anonymizing. I am using the props.conf method at the moment with source type and a sed command to replace the data but I can't seem to get it to work. I followed the KB article below and have done a rolling restart on all the indexers but still the data is not masked.

http://docs.splunk.com/Documentation/Splunk/6.1.4/Data/Anonymizedatausingconfigurationfiles

0 Karma

jrodman
Splunk Employee
Splunk Employee

This wasn't an answer to the question, so I moved to a comment.
However, it's really an independent question. I tried reopening it as such, but it didn't work.

I suggest you do the following:

  1. Open a new question
  2. In the question, be clear about the goal you are trying to achieve -- anonymize data prior to indexing it sounds like?
  3. Show an example of what your data looks like, not necessarily the log text, but something like it with the important values replaced with replacement text
  4. Show the configuration that you are using to handle it in full
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...