<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hashing instead of masking at index time in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130025#M26745</link>
    <description>&lt;P&gt;Obvious question is obvious: You did set the &lt;CODE&gt;invalid_cause&lt;/CODE&gt;, right?&lt;/P&gt;</description>
    <pubDate>Thu, 03 Jul 2014 21:19:28 GMT</pubDate>
    <dc:creator>martin_mueller</dc:creator>
    <dc:date>2014-07-03T21:19:28Z</dc:date>
    <item>
      <title>Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130022#M26742</link>
      <description>&lt;P&gt;During the Splunk parsing phase, is there any way to hash portions of the event?  I know it's possible to discard or mask (trim) portions of the event using SEDCMD or a transformer, but I don't see any options for hashing.&lt;/P&gt;

&lt;P&gt;I'm looking for a pure Splunk solution that doesn't require scripted (or modular) inputs.  Calling out from Splunk would be acceptable, but I'm unaware of any custom "hooks" in the parsing phase (for performance and stability reasons, I assume).&lt;/P&gt;

&lt;P&gt;I'm pretty sure I know the answer to this, but figured I'd ask before sending in a feature request.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;For a bit of background:&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Personally identifiable information (SSN, credit cards numbers, passwords,...) ends up in log files in clear text.  The core issue is often a software development one, but often the Splunk admins have now way to control this.  The safest option is to remove the sensitive info, but then you loose visibility.  Sometimes keeping a few characters will provide enough detail to compare different events without giving away the entire secret, but of course the risks are:  (1) Some of the information is still available in clear text, potentially revealing too much information and (2) since the full value isn't known, it's not possible to accurately compare values.   Using a hash function (like MD5 or SHA) the values instead would (1) fully protect the original value from being discovered, and (2) still allows for accurate grouping and/or transaction operations on the sensitive field.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jul 2014 15:01:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130022#M26742</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2014-07-03T15:01:48Z</dc:date>
    </item>
    <item>
      <title>Re: Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130023#M26743</link>
      <description>&lt;P&gt;I've had this requirement before: &lt;A href="http://answers.splunk.com/answers/88926/modify-_raw-collect-into-second-index-how-to-best-retain-host-source-sourcetype"&gt;http://answers.splunk.com/answers/88926/modify-_raw-collect-into-second-index-how-to-best-retain-host-source-sourcetype&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Basically no pretty Splunk-only solution.&lt;/P&gt;

&lt;P&gt;I didn't toy around with props.conf's &lt;CODE&gt;unarchive_cmd&lt;/CODE&gt; to see if you could hook a custom script into the indexing process using that... If I had to guess I'd say that might break incremental indexing because that's not available for .gz files either, but it might be worth a shot.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jul 2014 20:38:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130023#M26743</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-07-03T20:38:18Z</dc:date>
    </item>
    <item>
      <title>Re: Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130024#M26744</link>
      <description>&lt;P&gt;Wow, that's an interesting approach.  Sounds like something I would have dreamt up ;-).&lt;/P&gt;

&lt;P&gt;So I actually thought about the &lt;CODE&gt;unarchive_cmd&lt;/CODE&gt; option after posting the question and have played around a bit, but so far with no success.  After I cranked up the DEBUG logs I'm finally seeing &lt;CODE&gt;DEBUG ArchiveContext - /tmp/blah-debug.test.me is NOT an archive file.&lt;/CODE&gt;  I get the same error even if the file IS in gzip format, so I'm puzzled.&lt;/P&gt;

&lt;P&gt;Agreed that the incremental indexing thing could be a problem, but I may be able to work around that for the use case in front of me.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jul 2014 20:51:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130024#M26744</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2014-07-03T20:51:02Z</dc:date>
    </item>
    <item>
      <title>Re: Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130025#M26745</link>
      <description>&lt;P&gt;Obvious question is obvious: You did set the &lt;CODE&gt;invalid_cause&lt;/CODE&gt;, right?&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jul 2014 21:19:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130025#M26745</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-07-03T21:19:28Z</dc:date>
    </item>
    <item>
      <title>Re: Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130026#M26746</link>
      <description>&lt;P&gt;I think the invalid_clause is set right.  I even tried just reproducing the default &lt;CODE&gt;process-gzip&lt;/CODE&gt; stuff and can't make that work.  Posted as separate question here:&lt;BR /&gt;
&lt;A href="http://answers.splunk.com/answers/143771/whats-the-trick-to-get-unarchive_cmd-to-work-for-a-custom-archive-format" target="_blank"&gt;http://answers.splunk.com/answers/143771/whats-the-trick-to-get-unarchive_cmd-to-work-for-a-custom-archive-format&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:00:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130026#M26746</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2020-09-28T17:00:01Z</dc:date>
    </item>
    <item>
      <title>Re: Hashing instead of masking at index time</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130027#M26747</link>
      <description>&lt;P&gt;I just want to point out the that ELK stack can do this!&lt;/P&gt;

&lt;P&gt;So my answer is:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Deploy LogStash&lt;/LI&gt;
&lt;LI&gt;Configure it to read in the log&lt;/LI&gt;
&lt;LI&gt;Configure the hashing transformation&lt;/LI&gt;
&lt;LI&gt;Dump the output to a new log file&lt;/LI&gt;
&lt;LI&gt;Ingest the new log file with Splunk (UF)&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 07 Jan 2016 23:14:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Hashing-instead-of-masking-at-index-time/m-p/130027#M26747</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2016-01-07T23:14:30Z</dc:date>
    </item>
  </channel>
</rss>

