- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issues with Splunk "scrub" command not anomymizing data correctly


A customer was using Splunk "scrub" command to anonymize sensitive data (eg user name) at search time. While this worked well, they found names were not anonymized at all. They wrote a search to highlight these (index=test | table _time user | eval _user=user | scrub user | eval orig_user=_user | stats values(user) as users count by orig_user)
As we can see Splunk does a good job of anonymizing the user names Except for "Sarah Hardy" with is only partially anonymized and "Mike Smith" which is not anonymized at at all.
Why is this happening?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


This is actually a documentation issue and in fact scrub is actually as intended (but not as documented). I will try to explain 🙂
The "scrub" and "splunk anonymize" (used to anonymize diags) commands share a common library
The scrub documentation states:
public-terms
Syntax: public-terms=
Description: Specify a filename that includes the public terms to be anonymized.
private-terms
Syntax: private-terms=
Description: Specify a filename that includes the private terms to be anonymized.
name-terms
Syntax: name-terms=
Description: Specify a filename that includes names to be anonymized.
dictionary
Syntax: dictionary=
Description: Specify a filename that includes a dictionary of terms to be anonymized.
timeconfig
Syntax: timeconfig=
Description: Specify a filename that includes time configurations to be anonymized.
namespace
Syntax: namespace=
Description: Specify an application that contains the alternative files to use for anonymizing, instead of using the built-in anonymizing files.
The anonymize command states
public-terms file containing a list of locally-used words to NOT anonymize
(default= $SPLUNK_HOME/etc/anonymizer/public-terms.txt)
private-terms file containing a list of words to anonymize
(default= $SPLUNK_HOME/etc/anonymizer/private-terms.txt)
name-terms file containing a list of common English personal
names that Splunk uses to anonymize names with
(default= $SPLUNK_HOME/etc/anonymizer/names.txt)
dictionary file containing a global list of commonly-used
words to NOT anonymize - unless they are in the
private-terms file
(default= $SPLUNK_HOME/etc/anonymizer/dictionary.txt)
timestamp-config file that determines how timestamps are parsed
(default= $SPLUNK_HOME/etc/anonymizer/
anonymizer-time.ini)
Note that dictionary and public-terms in the anonymize documentation are documented as having the OPPOSITE affect as those in scrub. The correct action is in the anonymize documentation, ie Dictionary.txt and public-terms.txt contain a list of words NOT to anonymize unless they are in private-terms.txt
Surnames like Smith and Hardy are included in dictionary.txt as "smith" is an noun and a verb and "hardy" is an adverb.
"Mike Smith" fails on two accounts as both "smith" and "mike" are included in dictionary.txt. Adding "Mike Smith" to private-terms.txt resolves the issue.
