Getting Data In

Issues with Splunk "scrub" command not anomymizing data correctly

dshakespeare_sp
Splunk Employee
Splunk Employee

A customer was using Splunk "scrub" command to anonymize sensitive data (eg user name) at search time. While this worked well, they found names were not anonymized at all. They wrote a search to highlight these (index=test | table _time user | eval _user=user | scrub user | eval orig_user=_user | stats values(user) as users count by orig_user)

alt text

As we can see Splunk does a good job of anonymizing the user names Except for "Sarah Hardy" with is only partially anonymized and "Mike Smith" which is not anonymized at at all.

Why is this happening?

dshakespeare_sp
Splunk Employee
Splunk Employee

This is actually a documentation issue and in fact scrub is actually as intended (but not as documented). I will try to explain 🙂

The "scrub" and "splunk anonymize" (used to anonymize diags) commands share a common library

The scrub documentation states:

public-terms
Syntax: public-terms=
Description: Specify a filename that includes the public terms to be anonymized.

private-terms
Syntax: private-terms=
Description: Specify a filename that includes the private terms to be anonymized.

name-terms
Syntax: name-terms=
Description: Specify a filename that includes names to be anonymized.

dictionary
Syntax: dictionary=
Description: Specify a filename that includes a dictionary of terms to be anonymized.

timeconfig
Syntax: timeconfig=
Description: Specify a filename that includes time configurations to be anonymized.

namespace
Syntax: namespace=
Description: Specify an application that contains the alternative files to use for anonymizing, instead of using the built-in anonymizing files.

The anonymize command states

public-terms file containing a list of locally-used words to NOT anonymize
(default= $SPLUNK_HOME/etc/anonymizer/public-terms.txt)

private-terms file containing a list of words to anonymize
(default= $SPLUNK_HOME/etc/anonymizer/private-terms.txt)

name-terms file containing a list of common English personal
names that Splunk uses to anonymize names with
(default= $SPLUNK_HOME/etc/anonymizer/names.txt)

dictionary file containing a global list of commonly-used
words to NOT anonymize - unless they are in the
private-terms file
(default= $SPLUNK_HOME/etc/anonymizer/dictionary.txt)

timestamp-config file that determines how timestamps are parsed
(default= $SPLUNK_HOME/etc/anonymizer/
anonymizer-time.ini)

Note that dictionary and public-terms in the anonymize documentation are documented as having the OPPOSITE affect as those in scrub. The correct action is in the anonymize documentation, ie Dictionary.txt and public-terms.txt contain a list of words NOT to anonymize unless they are in private-terms.txt

Surnames like Smith and Hardy are included in dictionary.txt as "smith" is an noun and a verb and "hardy" is an adverb.
"Mike Smith" fails on two accounts as both "smith" and "mike" are included in dictionary.txt. Adding "Mike Smith" to private-terms.txt resolves the issue.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...