<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic makemv: Reducing a multivalued field down to a single value based on a lookup. in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/makemv-Reducing-a-multivalued-field-down-to-a-single-value-based/m-p/82412#M20932</link>
    <description>&lt;P&gt;Hi Splunkers/Splunkettes,&lt;/P&gt;

&lt;P&gt;To begin, I'm sorry about the length of the question.&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;Scenario&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;I have a &lt;EM&gt;large&lt;/EM&gt; amount of BlueCoat proxy logs that require to be reported on by the category that has been assigned to them by the Bluecoat. Example log from the Bluecoat app datagen:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2011-06-15 10:59:31.252088 13 10.0.0.1 sneezy FTW - OBSERVED "News/Media;Reference" - - 200 TCP_HIT GET text/html - "www.associatedbank.com" - "/N/K9USERE07E/CIPCWM03" - aspx Firefox/3.6.3 125.17.14.100 12960 1071 -
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In this instance, the &lt;CODE&gt;category&lt;/CODE&gt; value is &lt;STRONG&gt;News/Media;Reference&lt;/STRONG&gt;. So there are two categories: &lt;STRONG&gt;News/Media&lt;/STRONG&gt; and &lt;STRONG&gt;Reference&lt;/STRONG&gt;.&lt;/P&gt;

&lt;P&gt;The Bluecoat app handles this by applying a makemv command to the &lt;CODE&gt;category&lt;/CODE&gt; value, which effectively counts the usage for this record (1071 bytes) twice for reporting purposes... once in the News/Media category, and once in the Reference category.&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;End Goal&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;What I would like to do is redefine the category according to a priority lookup table, where the usage is only counted once in the category with the highest priority.&lt;/P&gt;

&lt;P&gt;Given the below lookup table (&lt;CODE&gt;category_priority.csv&lt;/CODE&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;category,   priority
---------------------
News/Media,        1
Reference,         2
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Running the search over the above event should give me a single event in the 'News/Media' category with 1071 bytes against it.&lt;/P&gt;

&lt;P&gt;The problem is, I have got this working... kinda...&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;What I've Tried&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;This search splits the category field into it's component categories and applied a priority.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | table dest_host, category, priority, sc_bytes
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Screen_Shot_2012-10-07_at_1.03.38_PM.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;Including a &lt;CODE&gt;mvexpand&lt;/CODE&gt; command will break out the event into two identical events (with the exception of the &lt;CODE&gt;category&lt;/CODE&gt; field), so a &lt;CODE&gt;sort&lt;/CODE&gt; &amp;amp; a &lt;CODE&gt;dedup&lt;/CODE&gt; here will give me what I'm after.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eventtype=bcoat_request | makemv delim=";" allowempty=t category | mvexpand category | lookup cs_category_summary.csv category | sort priority | dedup dest_host | table dest_host, category, priority
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Screen_Shot_2012-10-07_at_1.06.01_PM.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;But...&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;Issues with this approach&lt;/STRONG&gt;&lt;/H2&gt;

&lt;OL&gt;
&lt;LI&gt;I have a &lt;STRONG&gt;&lt;EM&gt;LOT&lt;/EM&gt;&lt;/STRONG&gt; of data (~2.4 billion records a month), so &lt;CODE&gt;dedup&lt;/CODE&gt; isn't really an option or best practice, and;&lt;/LI&gt;
&lt;LI&gt;Even with timestamps in the microseconds, I have identical (not duplicate) events that would be filtered out with a &lt;CODE&gt;dedup&lt;/CODE&gt; if I used one. Adding more fields as &lt;CODE&gt;dedup&lt;/CODE&gt; parameters is only going to make the search more expensive in terms of compute, and still no guarantee that I wont be filtering out valid use.&lt;/LI&gt;
&lt;/OL&gt;

&lt;H2&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;Is there a way to do this purely on a per-event basis using &lt;CODE&gt;eval&lt;/CODE&gt; statements? I tried applying a &lt;CODE&gt;sort&lt;/CODE&gt; to the &lt;CODE&gt;category&lt;/CODE&gt; field after applying the &lt;CODE&gt;makemv&lt;/CODE&gt; command but before the &lt;CODE&gt;mvexpand&lt;/CODE&gt; command, but that didn't take.&lt;/P&gt;

&lt;P&gt;Sorry for the length of the question &lt;span class="lia-unicode-emoji" title=":face_with_tongue:"&gt;😛&lt;/span&gt; Hoping someone can help!&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;TL;DR:&lt;/STRONG&gt; Need to reduce a multivalued field down to a single value based on a lookup.&lt;/P&gt;</description>
    <pubDate>Sun, 07 Oct 2012 02:17:58 GMT</pubDate>
    <dc:creator>rturk</dc:creator>
    <dc:date>2012-10-07T02:17:58Z</dc:date>
    <item>
      <title>makemv: Reducing a multivalued field down to a single value based on a lookup.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/makemv-Reducing-a-multivalued-field-down-to-a-single-value-based/m-p/82412#M20932</link>
      <description>&lt;P&gt;Hi Splunkers/Splunkettes,&lt;/P&gt;

&lt;P&gt;To begin, I'm sorry about the length of the question.&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;Scenario&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;I have a &lt;EM&gt;large&lt;/EM&gt; amount of BlueCoat proxy logs that require to be reported on by the category that has been assigned to them by the Bluecoat. Example log from the Bluecoat app datagen:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2011-06-15 10:59:31.252088 13 10.0.0.1 sneezy FTW - OBSERVED "News/Media;Reference" - - 200 TCP_HIT GET text/html - "www.associatedbank.com" - "/N/K9USERE07E/CIPCWM03" - aspx Firefox/3.6.3 125.17.14.100 12960 1071 -
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In this instance, the &lt;CODE&gt;category&lt;/CODE&gt; value is &lt;STRONG&gt;News/Media;Reference&lt;/STRONG&gt;. So there are two categories: &lt;STRONG&gt;News/Media&lt;/STRONG&gt; and &lt;STRONG&gt;Reference&lt;/STRONG&gt;.&lt;/P&gt;

&lt;P&gt;The Bluecoat app handles this by applying a makemv command to the &lt;CODE&gt;category&lt;/CODE&gt; value, which effectively counts the usage for this record (1071 bytes) twice for reporting purposes... once in the News/Media category, and once in the Reference category.&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;End Goal&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;What I would like to do is redefine the category according to a priority lookup table, where the usage is only counted once in the category with the highest priority.&lt;/P&gt;

&lt;P&gt;Given the below lookup table (&lt;CODE&gt;category_priority.csv&lt;/CODE&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;category,   priority
---------------------
News/Media,        1
Reference,         2
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Running the search over the above event should give me a single event in the 'News/Media' category with 1071 bytes against it.&lt;/P&gt;

&lt;P&gt;The problem is, I have got this working... kinda...&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;What I've Tried&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;This search splits the category field into it's component categories and applied a priority.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | table dest_host, category, priority, sc_bytes
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Screen_Shot_2012-10-07_at_1.03.38_PM.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;Including a &lt;CODE&gt;mvexpand&lt;/CODE&gt; command will break out the event into two identical events (with the exception of the &lt;CODE&gt;category&lt;/CODE&gt; field), so a &lt;CODE&gt;sort&lt;/CODE&gt; &amp;amp; a &lt;CODE&gt;dedup&lt;/CODE&gt; here will give me what I'm after.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eventtype=bcoat_request | makemv delim=";" allowempty=t category | mvexpand category | lookup cs_category_summary.csv category | sort priority | dedup dest_host | table dest_host, category, priority
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Screen_Shot_2012-10-07_at_1.06.01_PM.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;But...&lt;/P&gt;

&lt;H2&gt;&lt;STRONG&gt;Issues with this approach&lt;/STRONG&gt;&lt;/H2&gt;

&lt;OL&gt;
&lt;LI&gt;I have a &lt;STRONG&gt;&lt;EM&gt;LOT&lt;/EM&gt;&lt;/STRONG&gt; of data (~2.4 billion records a month), so &lt;CODE&gt;dedup&lt;/CODE&gt; isn't really an option or best practice, and;&lt;/LI&gt;
&lt;LI&gt;Even with timestamps in the microseconds, I have identical (not duplicate) events that would be filtered out with a &lt;CODE&gt;dedup&lt;/CODE&gt; if I used one. Adding more fields as &lt;CODE&gt;dedup&lt;/CODE&gt; parameters is only going to make the search more expensive in terms of compute, and still no guarantee that I wont be filtering out valid use.&lt;/LI&gt;
&lt;/OL&gt;

&lt;H2&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/H2&gt;

&lt;P&gt;Is there a way to do this purely on a per-event basis using &lt;CODE&gt;eval&lt;/CODE&gt; statements? I tried applying a &lt;CODE&gt;sort&lt;/CODE&gt; to the &lt;CODE&gt;category&lt;/CODE&gt; field after applying the &lt;CODE&gt;makemv&lt;/CODE&gt; command but before the &lt;CODE&gt;mvexpand&lt;/CODE&gt; command, but that didn't take.&lt;/P&gt;

&lt;P&gt;Sorry for the length of the question &lt;span class="lia-unicode-emoji" title=":face_with_tongue:"&gt;😛&lt;/span&gt; Hoping someone can help!&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;TL;DR:&lt;/STRONG&gt; Need to reduce a multivalued field down to a single value based on a lookup.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Oct 2012 02:17:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/makemv-Reducing-a-multivalued-field-down-to-a-single-value-based/m-p/82412#M20932</guid>
      <dc:creator>rturk</dc:creator>
      <dc:date>2012-10-07T02:17:58Z</dc:date>
    </item>
    <item>
      <title>Re: makemv: Reducing a multivalued field down to a single value based on a lookup.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/makemv-Reducing-a-multivalued-field-down-to-a-single-value-based/m-p/82413#M20933</link>
      <description>&lt;P&gt;How about something like this?&lt;/P&gt;

&lt;P&gt;eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | sort priority | eval new_category=mvindex(category,0) | table dest_host, new_category, priority, sc_bytes&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 12:34:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/makemv-Reducing-a-multivalued-field-down-to-a-single-value-based/m-p/82413#M20933</guid>
      <dc:creator>Lucas_K</dc:creator>
      <dc:date>2020-09-28T12:34:59Z</dc:date>
    </item>
  </channel>
</rss>

