Reporting

Anonymous based on scripting

ruisantos
Path Finder

Is there a way to anonymize data based on a script/function. I want to anonymize data but would like to have an hash that I can use to perform valid reports on it.

To further extend one what I would like to have.

Currently splunk allows me to anonymize data like this: eg. replace 123456789 with XXXXXX789.

What I would like is something more like: eg. replace 123456789 with the result of function md5(123456789)=jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

Tags (2)
0 Karma
1 Solution

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

View solution in original post

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

ruisantos
Path Finder

Thanks that is what I guessed. I've oppened an enhancement request for this.

0 Karma

Ayn
Legend

I disagree with that it would "guarantee" anonymity. Uniqueness, perhaps (as long as you don't manage to create a hash collision), but anonymity? It's just a matter of finding the correct string that produces the given MD5 sum. The masking approach taken by default in Splunk, on the other hand, alters the string in a way that guarantees that the original data cannot be recreated.

0 Karma

ruisantos
Path Finder

I saw that document. But that document performs a general replacement of characters.

eg. replace 123456789 with XXXXXX789.

What I would like is something more like.

eg. replace md5(123456789) with jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

0 Karma
Get Updates on the Splunk Community!

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Cultivate Your Career Growth with Fresh Splunk Training

Growth doesn’t just happen—it’s nurtured. Like tending a garden, developing your Splunk skills takes the right ...

Introducing a Smarter Way to Discover Apps on Splunkbase

We’re excited to announce the launch of a foundational enhancement to Splunkbase: App Tiering.  Because we’ve ...