<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Difference in Size Between Events in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184013#M36821</link>
    <description>&lt;P&gt;There are more field extractions occurring in the heavier events. So that could possibly be the case. &lt;/P&gt;</description>
    <pubDate>Fri, 08 May 2015 15:39:34 GMT</pubDate>
    <dc:creator>ConnorG</dc:creator>
    <dc:date>2015-05-08T15:39:34Z</dc:date>
    <item>
      <title>Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184009#M36817</link>
      <description>&lt;P&gt;I have two indexes that contain different sets of events. &lt;/P&gt;

&lt;P&gt;Index 1&lt;BR /&gt;
                Event Count – 23,952&lt;BR /&gt;
                Current Size – 19&lt;/P&gt;

&lt;P&gt;Index 2 &lt;BR /&gt;
                Event Count – 431,026&lt;BR /&gt;
                Current Size – 20&lt;/P&gt;

&lt;P&gt;The size is the same, but the number of events is drastically different.  This would make sense except that the events in both indexes are generally the same length. Any explanation for the difference in size here? &lt;/P&gt;

&lt;P&gt;Index 1 - Event Example&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;    {"time":"Fri Apr 03 17:57:08 CDT 2015","web_request_response_time":"0.45356011390686035","application":"node_count":"1","DataType":"PurepathData","state":"OK","cpu":"0.448837012052536","System Profile":"c_prodissue","breakdown":"CPU: 0.449 ms, Sync: -, Wait: -, Suspension: -","agent":"_JavaApp06_sin@sin:1547","root_path_thread_name":"http-apr-169.97.17.67-11000-exec-2","time":"Fri Apr 03 17:57:08 CDT 2015","response_time":"0.45356011390686035","execsum":"0.45356011390686035","name":"/SUI/monitoring","exec":"0.45361328125"}

     {"time":"Fri Apr 03 17:57:03 CDT 2015","web_request_response_time":"0.5128860473632812","application":"applic","node_count":"1","DataType":"PurepathData","state":"OK","cpu":"0.5083289742469788","System Profile":"_uat_prodissue","breakdown":"CPU: 0.508 ms, Sync: -, Wait: -, Suspension: -","agent":"UAT_JavaApp05_sin@sin:28893","root_path_thread_name":"http-apr-169.97.17.62-11000-exec-17","time":"Fri Apr 03 17:57:03 CDT 2015","response_time":"0.5128860473632812","execsum":"0.5128860473632812","name":"/UI/monitoring","exec":"0.512939453125"}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Index 2 - Event Example&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;            System_Profile=Monitoring #document dynatrace version=6.1.0.8054 systemprofile capture=true modifiedby=E745984 repositoryaccess=true incidentrules incidentrule flags=1 id=Host Disk Unhealthy incidentdashboardname=Incident Zero Conf Dashboard timeframe=10 actions actionref bundleversion=0.0.0 execution=begin key=com.dynatrace.diagnostics.plugins.EmailNotification refaction=com.dynatrace.diagnostics.plugins.EmailNotification rolekey=com.dynatrace.diagnostics.plugins.EmailNotificationAction roletype=1 severity=informational smartalert=false type=Email Notification property key=from typeid=string value= 

            System_Profile=Monitoring #document dynatrace version=6.1.0.8054 systemprofile capture=true modifiedby=E745984 repositoryaccess=true incidentrules incidentrule flags=1 id=Host Network Unhealthy incidentdashboardname=Incident Zero Conf Dashboard timeframe=10 actions actionref bundleversion=0.0.0 execution=begin key=com.dynatrace.diagnostics.plugins.EmailNotification refaction=com.dynatrace.diagnostics.plugins.EmailNotification rolekey=com.dynatrace.diagnostics.plugins.EmailNotificationAction roletype=1 severity=informational smartalert=false type=Email Notification property key=bcc typeid=string value= 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 08 May 2015 15:05:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184009#M36817</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T15:05:27Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184010#M36818</link>
      <description>&lt;P&gt;how are you calculating "size"?&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:10:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184010#M36818</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-05-08T15:10:21Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184011#M36819</link>
      <description>&lt;P&gt;That is coming from the Indexes view in the Splunk Settings. "Current size in MB"&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:11:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184011#M36819</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T15:11:57Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184012#M36820</link>
      <description>&lt;P&gt;Is it possible there are one or two rogue &lt;EM&gt;gigantic&lt;/EM&gt; events in Index 1? I've never used it personally, but I've read of people using "eval esize" to check this kind of thing.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:21:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184012#M36820</guid>
      <dc:creator>j4adam</dc:creator>
      <dc:date>2015-05-08T15:21:29Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184013#M36821</link>
      <description>&lt;P&gt;There are more field extractions occurring in the heavier events. So that could possibly be the case. &lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:39:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184013#M36821</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T15:39:34Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184014#M36822</link>
      <description>&lt;P&gt;I believe there is a character limit for events. So even if there were a handful of rogue events that still couldn't account for the tenfold size increase.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:41:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184014#M36822</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T15:41:09Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184015#M36823</link>
      <description>&lt;P&gt;I'm going to guess that your data in index 1 has &lt;CODE&gt;INDEXED_EXTRACTIONS=json&lt;/CODE&gt; activated in props.conf. More space used in that case is expected behaviour, that space is traded for speed when using those fields - especially in &lt;CODE&gt;tstats&lt;/CODE&gt; situations.&lt;/P&gt;

&lt;P&gt;To further investigate, run these two searches:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| dbinspect index=index1 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB

| dbinspect index=index2 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That'll give you the event count, the raw size ingested into each bucket for that index, and how much space each bucket occupies on disk. If you have single huge rogue events you should see one bucket behaving differently from the others, if my JSON guess is correct all buckets for an index should look fairly similar.&lt;/P&gt;

&lt;P&gt;As for your events themselves, it seems the data in index 1 has more unique tokens - for example, those huge precision numbers. Lots of unique tokens will increase the size of dictionaries, and hence Splunk's index structures. The index 2 sample events seems to have lots of repeating tokens in the field values, not a lot of unique ones.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:42:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184015#M36823</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-05-08T15:42:53Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184016#M36824</link>
      <description>&lt;P&gt;You're assumption is correct. So you're saying that the data in index1 can be searched faster? &lt;/P&gt;

&lt;P&gt;This data is coming from a custom made script. If the trade off for larger file size is quicker results then I will leave the formatting as is. Otherwise if there were no pros to having the events formatted as such I would change it to be simpler.&lt;/P&gt;

&lt;P&gt;Thanks for the heads up. Are there any reference docs available related to this?&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 15:48:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184016#M36824</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T15:48:08Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184017#M36825</link>
      <description>&lt;P&gt;The configuration reference is here: &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Propsconf"&gt;http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Propsconf&lt;/A&gt; (search for &lt;CODE&gt;INDEXED_EXTRACTIONS&lt;/CODE&gt;)&lt;BR /&gt;
There's a bit of human-readable docs here: docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime&lt;/P&gt;

&lt;P&gt;Regular searches should run at similar speeds. What benefits the most is stuff like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats avg(cpu) avg(web_request_response_time) where index=index1 by _time span=auto prestats=t | timechart avg(cpu) avg(web_request_response_time)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That should be &lt;EM&gt;massively&lt;/EM&gt; faster than trying to pry the &lt;CODE&gt;cpu&lt;/CODE&gt; and &lt;CODE&gt;web_request_response_time&lt;/CODE&gt; fields from the JSON at search time.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 16:06:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184017#M36825</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-05-08T16:06:53Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184018#M36826</link>
      <description>&lt;P&gt;Ah, I didn't know that actually. &lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 16:13:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184018#M36826</guid>
      <dc:creator>j4adam</dc:creator>
      <dc:date>2015-05-08T16:13:43Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184019#M36827</link>
      <description>&lt;P&gt;By default, Splunk will force an event break after 10000 characters. You can modify that in props.conf using the &lt;CODE&gt;TRUNCATE&lt;/CODE&gt; setting. In the same spirit, the default will break after 256 lines in one event, see &lt;CODE&gt;MAX_EVENTS&lt;/CODE&gt; in props.conf.&lt;/P&gt;

&lt;P&gt;These default limits are there to mitigate either wrong configurations or systems throwing unexpected log data.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 16:28:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184019#M36827</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-05-08T16:28:12Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184020#M36828</link>
      <description>&lt;P&gt;There's some more info in this post here:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/answers/4162/size-limit-for-an-event.html"&gt;http://answers.splunk.com/answers/4162/size-limit-for-an-event.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 17:13:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184020#M36828</guid>
      <dc:creator>ConnorG</dc:creator>
      <dc:date>2015-05-08T17:13:44Z</dc:date>
    </item>
    <item>
      <title>Re: Difference in Size Between Events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184021#M36829</link>
      <description>&lt;P&gt;Yeah, I immediately looked into that as soon as you mentioned it. That post exactly, actually. Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2015 17:30:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Difference-in-Size-Between-Events/m-p/184021#M36829</guid>
      <dc:creator>j4adam</dc:creator>
      <dc:date>2015-05-08T17:30:36Z</dc:date>
    </item>
  </channel>
</rss>

