<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issues with Splunk &amp;quot;scrub&amp;quot; command not anomymizing data correctly in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Issues-with-Splunk-quot-scrub-quot-command-not-anomymizing-data/m-p/363445#M66221</link>
    <description>&lt;P&gt;This is actually a  documentation issue and in fact scrub is actually as intended (but not as documented). I will try to explain &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;

&lt;P&gt;The "scrub" and "splunk anonymize" (used to anonymize diags) commands share a common library &lt;/P&gt;

&lt;H1&gt;The scrub documentation states: &lt;/H1&gt;

&lt;P&gt;public-terms &lt;BR /&gt;
Syntax: public-terms= &lt;BR /&gt;
Description: Specify a filename that includes the public terms to be anonymized. &lt;/P&gt;

&lt;P&gt;private-terms &lt;BR /&gt;
Syntax: private-terms= &lt;BR /&gt;
Description: Specify a filename that includes the private terms to be anonymized. &lt;/P&gt;

&lt;P&gt;name-terms &lt;BR /&gt;
Syntax: name-terms= &lt;BR /&gt;
Description: Specify a filename that includes names to be anonymized. &lt;/P&gt;

&lt;P&gt;dictionary &lt;BR /&gt;
Syntax: dictionary= &lt;BR /&gt;
Description: Specify a filename that includes a dictionary of terms to be anonymized. &lt;/P&gt;

&lt;P&gt;timeconfig &lt;BR /&gt;
Syntax: timeconfig= &lt;BR /&gt;
Description: Specify a filename that includes time configurations to be anonymized. &lt;/P&gt;

&lt;P&gt;namespace &lt;BR /&gt;
Syntax: namespace= &lt;BR /&gt;
Description: Specify an application that contains the alternative files to use for anonymizing, instead of using the built-in anonymizing files. &lt;/P&gt;

&lt;H1&gt;The anonymize command states &lt;/H1&gt;

&lt;P&gt;public-terms file containing a list of locally-used words to NOT anonymize &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/public-terms.txt) &lt;/P&gt;

&lt;P&gt;private-terms file containing a list of words to anonymize &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/private-terms.txt) &lt;/P&gt;

&lt;P&gt;name-terms file containing a list of common English personal &lt;BR /&gt;
names that Splunk uses to anonymize names with &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/names.txt) &lt;/P&gt;

&lt;P&gt;dictionary file containing a global list of commonly-used &lt;BR /&gt;
words to NOT anonymize - unless they are in the &lt;BR /&gt;
private-terms file &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/dictionary.txt) &lt;/P&gt;

&lt;P&gt;timestamp-config file that determines how timestamps are parsed &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/ &lt;BR /&gt;
anonymizer-time.ini) &lt;/P&gt;

&lt;P&gt;Note that dictionary and public-terms in the anonymize documentation are documented as having the OPPOSITE affect as those in scrub. The correct action is in the anonymize documentation,  ie Dictionary.txt and public-terms.txt contain a list of words NOT to anonymize unless they are in private-terms.txt &lt;/P&gt;

&lt;P&gt;Surnames like Smith and Hardy are included in dictionary.txt as "smith" is an noun and a verb and "hardy" is an adverb. &lt;BR /&gt;
"Mike Smith" fails on two accounts as both "smith" and "mike" are included in dictionary.txt. Adding "Mike Smith" to private-terms.txt resolves the issue. &lt;/P&gt;</description>
    <pubDate>Mon, 26 Jun 2017 09:43:40 GMT</pubDate>
    <dc:creator>dshakespeare_sp</dc:creator>
    <dc:date>2017-06-26T09:43:40Z</dc:date>
    <item>
      <title>Issues with Splunk "scrub" command not anomymizing data correctly</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Issues-with-Splunk-quot-scrub-quot-command-not-anomymizing-data/m-p/363444#M66220</link>
      <description>&lt;P&gt;A customer was using Splunk "scrub" command to anonymize sensitive data (eg user name) at search time.  While this worked well, they found names were not anonymized at all. They wrote a search to highlight these (index=test | table _time user | eval _user=user | scrub user | eval orig_user=_user | stats values(user) as users count by orig_user)&lt;/P&gt;

&lt;P&gt;&lt;IMG src="https://community.splunk.com/storage/temp/206708-screen-shot-2017-06-26-at-102339.jpg" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;As we can see Splunk does a good job of anonymizing the user names Except for  "Sarah Hardy" with is only partially  anonymized and "Mike Smith" which is not anonymized at at all.&lt;/P&gt;

&lt;P&gt;Why is this happening?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 14:35:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Issues-with-Splunk-quot-scrub-quot-command-not-anomymizing-data/m-p/363444#M66220</guid>
      <dc:creator>dshakespeare_sp</dc:creator>
      <dc:date>2020-09-29T14:35:03Z</dc:date>
    </item>
    <item>
      <title>Re: Issues with Splunk "scrub" command not anomymizing data correctly</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Issues-with-Splunk-quot-scrub-quot-command-not-anomymizing-data/m-p/363445#M66221</link>
      <description>&lt;P&gt;This is actually a  documentation issue and in fact scrub is actually as intended (but not as documented). I will try to explain &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;

&lt;P&gt;The "scrub" and "splunk anonymize" (used to anonymize diags) commands share a common library &lt;/P&gt;

&lt;H1&gt;The scrub documentation states: &lt;/H1&gt;

&lt;P&gt;public-terms &lt;BR /&gt;
Syntax: public-terms= &lt;BR /&gt;
Description: Specify a filename that includes the public terms to be anonymized. &lt;/P&gt;

&lt;P&gt;private-terms &lt;BR /&gt;
Syntax: private-terms= &lt;BR /&gt;
Description: Specify a filename that includes the private terms to be anonymized. &lt;/P&gt;

&lt;P&gt;name-terms &lt;BR /&gt;
Syntax: name-terms= &lt;BR /&gt;
Description: Specify a filename that includes names to be anonymized. &lt;/P&gt;

&lt;P&gt;dictionary &lt;BR /&gt;
Syntax: dictionary= &lt;BR /&gt;
Description: Specify a filename that includes a dictionary of terms to be anonymized. &lt;/P&gt;

&lt;P&gt;timeconfig &lt;BR /&gt;
Syntax: timeconfig= &lt;BR /&gt;
Description: Specify a filename that includes time configurations to be anonymized. &lt;/P&gt;

&lt;P&gt;namespace &lt;BR /&gt;
Syntax: namespace= &lt;BR /&gt;
Description: Specify an application that contains the alternative files to use for anonymizing, instead of using the built-in anonymizing files. &lt;/P&gt;

&lt;H1&gt;The anonymize command states &lt;/H1&gt;

&lt;P&gt;public-terms file containing a list of locally-used words to NOT anonymize &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/public-terms.txt) &lt;/P&gt;

&lt;P&gt;private-terms file containing a list of words to anonymize &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/private-terms.txt) &lt;/P&gt;

&lt;P&gt;name-terms file containing a list of common English personal &lt;BR /&gt;
names that Splunk uses to anonymize names with &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/names.txt) &lt;/P&gt;

&lt;P&gt;dictionary file containing a global list of commonly-used &lt;BR /&gt;
words to NOT anonymize - unless they are in the &lt;BR /&gt;
private-terms file &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/dictionary.txt) &lt;/P&gt;

&lt;P&gt;timestamp-config file that determines how timestamps are parsed &lt;BR /&gt;
(default= $SPLUNK_HOME/etc/anonymizer/ &lt;BR /&gt;
anonymizer-time.ini) &lt;/P&gt;

&lt;P&gt;Note that dictionary and public-terms in the anonymize documentation are documented as having the OPPOSITE affect as those in scrub. The correct action is in the anonymize documentation,  ie Dictionary.txt and public-terms.txt contain a list of words NOT to anonymize unless they are in private-terms.txt &lt;/P&gt;

&lt;P&gt;Surnames like Smith and Hardy are included in dictionary.txt as "smith" is an noun and a verb and "hardy" is an adverb. &lt;BR /&gt;
"Mike Smith" fails on two accounts as both "smith" and "mike" are included in dictionary.txt. Adding "Mike Smith" to private-terms.txt resolves the issue. &lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 09:43:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Issues-with-Splunk-quot-scrub-quot-command-not-anomymizing-data/m-p/363445#M66221</guid>
      <dc:creator>dshakespeare_sp</dc:creator>
      <dc:date>2017-06-26T09:43:40Z</dc:date>
    </item>
  </channel>
</rss>

