Reporting

Anonymous based on scripting

ruisantos
Path Finder

Is there a way to anonymize data based on a script/function. I want to anonymize data but would like to have an hash that I can use to perform valid reports on it.

To further extend one what I would like to have.

Currently splunk allows me to anonymize data like this: eg. replace 123456789 with XXXXXX789.

What I would like is something more like: eg. replace 123456789 with the result of function md5(123456789)=jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

Tags (2)
0 Karma
1 Solution

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

View solution in original post

Kate_Lawrence-G
Contributor

Hmm...I don't think that is something you can do natively in Splunk. The anonymize data function is limited to replacement/character substitution through either SED or REGEX.

The closest 3rd party app I see uploaded is:
http://splunk-base.splunk.com/apps/22403/adds-support-for-anonymizing-log-files-at-index-time , but I think that it's probably just character substations based on common fields found in data.

It sounds like you actually want to randomize the data with a hash or some kind of seed so that its completely unique.

I think the best bet for this would be a custom python command that accepts the raw data does a specific function and then spits out a new field based on logic external to Splunk.

Here is the link to the the Splunk doc on this:

http://docs.splunk.com/Documentation/Splunk/4.3/SearchReference/WriteaPythonsearchcommand#Examples

ruisantos
Path Finder

Thanks that is what I guessed. I've oppened an enhancement request for this.

0 Karma

Ayn
Legend

I disagree with that it would "guarantee" anonymity. Uniqueness, perhaps (as long as you don't manage to create a hash collision), but anonymity? It's just a matter of finding the correct string that produces the given MD5 sum. The masking approach taken by default in Splunk, on the other hand, alters the string in a way that guarantees that the original data cannot be recreated.

0 Karma

ruisantos
Path Finder

I saw that document. But that document performs a general replacement of characters.

eg. replace 123456789 with XXXXXX789.

What I would like is something more like.

eg. replace md5(123456789) with jf430fj490fj4

This would guarantee anonymity and uniqueness for reporting.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

    Thursday, July 9, 2026  |  11:00AM–12:00PM PDT Duration: 1 hour (includes Q&A) Managing can feel like a ...

Upgrade Prep for 10.4, Network Observability Deep Dives, and More from Splunk Lantern

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...

Splunk Developer Day announcements: AI agents, MCP tools, Forecasting, and Custom ...

Splunk Developer Day was packed with product and platform updates for developers building in the AI ...