<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Exactly which bytes count as license usage? in Monitoring Splunk</title>
    <link>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265130#M2502</link>
    <description>&lt;P&gt;Ok   len(_raw) works because each ASCII character = 8 binary bits = 1 byte on disk ... so the word four is 4 bytes, the word OMG is 3 bytes, the number 456 in string format is 3 bytes, the string "hey 1234" is 8 bytes and so on.&lt;/P&gt;

&lt;P&gt;so that's why getting the length of the raw field equates to bytes.  But that only works if you're working with ASCII encoding:&lt;BR /&gt;
&lt;A href="http://stackoverflow.com/questions/1049139/do-certain-characters-take-more-bytes-than-others"&gt;http://stackoverflow.com/questions/1049139/do-certain-characters-take-more-bytes-than-others&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;{"time":"2016-05-20 09:00:00.000","myfield":"myvalue"}\r\n &amp;lt;- would drop the \r\n, time is stored in epoch in the index (IF EXTRAPOLATED correctly, but your event would remain this total length in size according to license usage)&lt;/P&gt;

&lt;P&gt;{"time":1463734800,"event":{"myfield":"myvalue"}} &amp;lt;- time is stored in epoch in the index (IF EXTRAPOLATED correctly), and you would save on your license because the timestamp is smaller&lt;/P&gt;

&lt;P&gt;Further savings would come from this:&lt;BR /&gt;
{"time":1463734800,"event1":{"myfield":"myvalue"},"event2":{"myfield":"myvalue"},"event3":{"myfield":"myvalue"}}  &lt;/P&gt;

&lt;P&gt;As there would be only one time stamp and 3 events.  Json is not much fun to play with though... see this post &lt;A href="https://answers.splunk.com/answering/401972/view.html"&gt;https://answers.splunk.com/answering/401972/view.html&lt;/A&gt; where I recently learned the horrors of "nested json"&lt;/P&gt;</description>
    <pubDate>Fri, 20 May 2016 07:18:50 GMT</pubDate>
    <dc:creator>jkat54</dc:creator>
    <dc:date>2016-05-20T07:18:50Z</dc:date>
    <item>
      <title>Exactly which bytes count as license usage?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265129#M2501</link>
      <description>&lt;P&gt;I've read various topics on license usage, but I'm still confused about the basic calculation: exactly which bytes count as license usage?&lt;/P&gt;

&lt;P&gt;A possible answer might be: the number of bytes in the &lt;CODE&gt;_raw&lt;/CODE&gt; field. But I recognize that might be simplistic, or at least incomplete.&lt;/P&gt;

&lt;P&gt;My own - possibly faulty - experiments indicate that "number of &lt;EM&gt;bytes&lt;/EM&gt;" &lt;EM&gt;is&lt;/EM&gt; simplistic, at least in the following regard: &lt;CODE&gt;len()&lt;/CODE&gt; appears to count multibyte UTF-8 characters as 1, as I'd hope. So, "number of &lt;EM&gt;characters&lt;/EM&gt;", then, depending on the character set encoding used by Splunk to interpret the length of a string.&lt;/P&gt;

&lt;P&gt;The recent Splunk blog post "&lt;A href="http://blogs.splunk.com/2016/05/06/what-size-should-my-splunk-license-be/"&gt;What size should my Splunk license be?&lt;/A&gt;" includes the following command in a search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eval evt_bytes = len(_raw)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The naming of that field - specifically, the trailing term &lt;CODE&gt;_bytes&lt;/CODE&gt; - makes me think that I might be wrong about how &lt;CODE&gt;len()&lt;/CODE&gt; treats multibyte characters.&lt;/P&gt;

&lt;P&gt;However, I'm unsure, and - with apologies to the blog post author if I've missed it - the blog post doesn't describe, whether the &lt;CODE&gt;b&lt;/CODE&gt; field from &lt;CODE&gt;index=_internal source=*license_usage.log type=Usage&lt;/CODE&gt; is simply a total of &lt;CODE&gt;evt_bytes&lt;/CODE&gt;, or includes other bytes, or is not based on &lt;CODE&gt;len(_raw)&lt;/CODE&gt; at all.&lt;/P&gt;

&lt;P&gt;For example, if I send Splunk the following JSON-formatted event via TCP:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"time":"2016-05-20 09:00:00.000","myfield":"myvalue"}\r\n
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(where &lt;CODE&gt;\r\n&lt;/CODE&gt; represents two bytes: a "carriage return/linefeed pair")&lt;/P&gt;

&lt;P&gt;consisting of 56 bytes (if you include the trailing &lt;CODE&gt;\r\n&lt;/CODE&gt;)&lt;/P&gt;

&lt;P&gt;then what exactly is this event's contribution to license usage? 56 bytes? Or 54 bytes (if the &lt;CODE&gt;\r\n&lt;/CODE&gt; is not included)? Or a higher number, to account for Splunk internal field values associated with this event?&lt;/P&gt;

&lt;P&gt;While I'm asking (with apologies if readers think this should be a separate question)... if I send the same event via the HTTP Event Collector:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"time":1463734800,"event":{"myfield":"myvalue"}}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;then do I save on license usage by having the time stamp as metadata, rather than in the event data (that becomes the &lt;CODE&gt;_raw&lt;/CODE&gt; field)?&lt;/P&gt;

&lt;P&gt;Before asking this question, I considered performing my own tests, indexing single events (via TCP and HEC) into brand new indexes, and then looking at the corresponding &lt;CODE&gt;b&lt;/CODE&gt; field values in the log file. I might still do that, but I have limited time, and anyway, I'd like to know what the figures &lt;EM&gt;should&lt;/EM&gt; show, so that, if I do these tests, I can confirm or deny that (or, more likely, figure out where I've gone wrong in my testing &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ).&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 05:14:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265129#M2501</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2016-05-20T05:14:38Z</dc:date>
    </item>
    <item>
      <title>Re: Exactly which bytes count as license usage?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265130#M2502</link>
      <description>&lt;P&gt;Ok   len(_raw) works because each ASCII character = 8 binary bits = 1 byte on disk ... so the word four is 4 bytes, the word OMG is 3 bytes, the number 456 in string format is 3 bytes, the string "hey 1234" is 8 bytes and so on.&lt;/P&gt;

&lt;P&gt;so that's why getting the length of the raw field equates to bytes.  But that only works if you're working with ASCII encoding:&lt;BR /&gt;
&lt;A href="http://stackoverflow.com/questions/1049139/do-certain-characters-take-more-bytes-than-others"&gt;http://stackoverflow.com/questions/1049139/do-certain-characters-take-more-bytes-than-others&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;{"time":"2016-05-20 09:00:00.000","myfield":"myvalue"}\r\n &amp;lt;- would drop the \r\n, time is stored in epoch in the index (IF EXTRAPOLATED correctly, but your event would remain this total length in size according to license usage)&lt;/P&gt;

&lt;P&gt;{"time":1463734800,"event":{"myfield":"myvalue"}} &amp;lt;- time is stored in epoch in the index (IF EXTRAPOLATED correctly), and you would save on your license because the timestamp is smaller&lt;/P&gt;

&lt;P&gt;Further savings would come from this:&lt;BR /&gt;
{"time":1463734800,"event1":{"myfield":"myvalue"},"event2":{"myfield":"myvalue"},"event3":{"myfield":"myvalue"}}  &lt;/P&gt;

&lt;P&gt;As there would be only one time stamp and 3 events.  Json is not much fun to play with though... see this post &lt;A href="https://answers.splunk.com/answering/401972/view.html"&gt;https://answers.splunk.com/answering/401972/view.html&lt;/A&gt; where I recently learned the horrors of "nested json"&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 07:18:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265130#M2502</guid>
      <dc:creator>jkat54</dc:creator>
      <dc:date>2016-05-20T07:18:50Z</dc:date>
    </item>
    <item>
      <title>Re: Exactly which bytes count as license usage?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265131#M2503</link>
      <description>&lt;P&gt;@jkat54, thanks for your answer.&lt;/P&gt;

&lt;P&gt;Re:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;the timestamp is smaller&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I don't understand. Smaller? How?&lt;/P&gt;

&lt;P&gt;In my TCP example, the event data contains a &lt;CODE&gt;time&lt;/CODE&gt; field that Splunk uses to set the internal &lt;CODE&gt;_time&lt;/CODE&gt; field according to &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt; settings in &lt;CODE&gt;props.conf&lt;/CODE&gt;. And that &lt;CODE&gt;time&lt;/CODE&gt; field value also appears with the rest of the event data in the &lt;CODE&gt;_raw&lt;/CODE&gt; field.&lt;/P&gt;

&lt;P&gt;In my HEC example, the &lt;CODE&gt;event&lt;/CODE&gt; key does not contain a &lt;CODE&gt;time&lt;/CODE&gt; field. Instead, the event time stamp is specified in the &lt;EM&gt;metadata&lt;/EM&gt; &lt;CODE&gt;time&lt;/CODE&gt; key that Splunk uses to set the &lt;CODE&gt;_time&lt;/CODE&gt; field.&lt;/P&gt;

&lt;P&gt;In both examples, the indexed event has an internal &lt;CODE&gt;_index&lt;/CODE&gt; field (Unix Epoch time value).&lt;/P&gt;

&lt;P&gt;However, only the &lt;CODE&gt;_raw&lt;/CODE&gt; field for the event received via TCP contains a &lt;CODE&gt;time&lt;/CODE&gt; field. The &lt;CODE&gt;_raw&lt;/CODE&gt; field for the event received via HEC does not contain a time stamp value. That is where I see the potential saving in license usage: the &lt;EM&gt;absence&lt;/EM&gt; of a time stamp value from the &lt;CODE&gt;_raw&lt;/CODE&gt; field. Is that what you meant by "smaller"?&lt;/P&gt;

&lt;P&gt;(And when you wrote "extrapolated", did you mean "extracted"?)&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 09:16:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Exactly-which-bytes-count-as-license-usage/m-p/265131#M2503</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2016-05-20T09:16:12Z</dc:date>
    </item>
  </channel>
</rss>

