<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to obtain valid value of a field among known bad ones in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555224#M157598</link>
    <description>&lt;P&gt;The reason why that will not work is because we don't know which events contain valid values for which field. (I probably oversimplified in sample simulation. &amp;nbsp;There are quite a number of bad fields kind of flip-flop in unpredictable ways.)&lt;/P&gt;</description>
    <pubDate>Thu, 10 Jun 2021 05:50:18 GMT</pubDate>
    <dc:creator>yuanliu</dc:creator>
    <dc:date>2021-06-10T05:50:18Z</dc:date>
    <item>
      <title>How to obtain valid value of a field among known bad ones</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555171#M157580</link>
      <description>&lt;P&gt;I have some data with flip-flop values akin to the following simulation&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults count=20
| eval id = "id" . (random() % 2 + 1), bad = case(random() % 3 == 0, "real " . id, random() % 5 == 0, "", true(), "fake")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What it does is to insert a known "fake" value, and zero-length string in some events. (Comically, the fake value is the string literal "null".) A sample data set could look like&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;_time&lt;/TD&gt;&lt;TD&gt;bad&lt;/TD&gt;&lt;TD&gt;id&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id2&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id1&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id1&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id2&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id1&lt;/TD&gt;&lt;TD&gt;id1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;real id2&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-06-09 18:45:09&lt;/TD&gt;&lt;TD&gt;fake&lt;/TD&gt;&lt;TD&gt;id2&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;For my purposes, all I care is that any field whose value is not "fake" or "" be extracted. &amp;nbsp;The challenge is, of course, that some records may have bad value for "legitimate" reasons (e.g., data entry or machine errors) so all their values are "fake". &amp;nbsp;I also cannot predict whether all fields are having "fake" in the same record, i.e., "real" value for different fields may be contained in different events.&lt;/P&gt;&lt;P&gt;One method I come up with is to mvfilter() from bad, like so&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| dedup id bad
| eventstats values(bad) as bad by id
| eval realbad = mvfilter(NOT match(bad, "fake"))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(dedup is added just to reduce load for eventstats. &amp;nbsp;Not sure if that makes a difference.) Lucky for me, values() seems to have dropped those zero-length values. (For my purposes, it is OK if this field becomes null, in case all values are zero-length or fake.) Otherwise, I'll just add another test in mvfilter().&lt;/P&gt;&lt;P&gt;Whereas this produces desired results, I feel that eventstats+mvfilter() could still be expensive. &amp;nbsp;Any standard/recommended methods?&lt;/P&gt;&lt;P&gt;(Clarification: Whereas sample simulation illustrates one 'bad' field, there are multiple bad fields that flip-flop seemingly independent of each other, so globally filtering out all events containing 'invalid' values is not an option because the use case requires valid value of as many fields as possible.)&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jun 2021 17:50:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555171#M157580</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-06-10T17:50:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to obtain valid value of a field among known bad ones</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555192#M157585</link>
      <description>&lt;P&gt;I am not sure I understand what you are trying to do - can you not just&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| where bad!="fake" AND bad!=""&lt;/LI-CODE&gt;</description>
      <pubDate>Wed, 09 Jun 2021 22:25:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555192#M157585</guid>
      <dc:creator>ITWhisperer</dc:creator>
      <dc:date>2021-06-09T22:25:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to obtain valid value of a field among known bad ones</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555224#M157598</link>
      <description>&lt;P&gt;The reason why that will not work is because we don't know which events contain valid values for which field. (I probably oversimplified in sample simulation. &amp;nbsp;There are quite a number of bad fields kind of flip-flop in unpredictable ways.)&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jun 2021 05:50:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555224#M157598</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-06-10T05:50:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to obtain valid value of a field among known bad ones</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555231#M157601</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;Lucky for me, values() seems to have dropped those zero-length values. (For my purposes, it is OK if this field becomes null, in case all values are zero-length or fake.)&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Two problems:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;values() drops empty string values only when non-empty values exist. &amp;nbsp;When all values are empty, the result is still an empty string.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;If all values are "fake", the result is still "fake".&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Both empty string and "fake" are more difficult for downstream processing; a mixture of two values to represent bad states is worse for my purposes.&lt;/P&gt;&lt;P&gt;Additionally, eventstats is useless for this illustration and not used in my use case. &amp;nbsp;I also realize that match() is totally unnecessary when string identity will do better. &amp;nbsp;It is almost as if brute force is better.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| dedup bad id
| eval bad = if(bad IN ("", "fake"), null(), bad)
| stats values(*) as * by id&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As mentioned in another comment, I have multiple "bad" fields.&lt;/P&gt;&lt;P&gt;With this workaround, each of them still needs to eval separately, but at least values() can use wildcard.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2021 02:07:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-obtain-valid-value-of-a-field-among-known-bad-ones/m-p/555231#M157601</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-06-16T02:07:30Z</dc:date>
    </item>
  </channel>
</rss>

