Reporting

Anonymous based on scripting

ruisantos
Path Finder

Is there a way to anonymize data based on a script/function. I want to anonymize data but would like to have an hash that I can use to perform valid reports on it.

To further extend one what I would like to have.

Currently splunk allows me to anonymize data like this: eg. replace 123456789 with XXXXXX789.

What I would like is something more like: eg. replace 123456789 with the result of function md5(123456789)=jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

Tags (2)
0 Karma
1 Solution

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

View solution in original post

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

ruisantos
Path Finder

Thanks that is what I guessed. I've oppened an enhancement request for this.

0 Karma

Ayn
Legend

I disagree with that it would "guarantee" anonymity. Uniqueness, perhaps (as long as you don't manage to create a hash collision), but anonymity? It's just a matter of finding the correct string that produces the given MD5 sum. The masking approach taken by default in Splunk, on the other hand, alters the string in a way that guarantees that the original data cannot be recreated.

0 Karma

ruisantos
Path Finder

I saw that document. But that document performs a general replacement of characters.

eg. replace 123456789 with XXXXXX789.

What I would like is something more like.

eg. replace md5(123456789) with jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...