<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Bucket/Timechart and Dedup in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103071#M26634</link>
    <description>&lt;P&gt;I'm trying to plot total load-avg vs number of processors in a cluster (i.e. how loaded is the system). The following basically works:&lt;/P&gt;

&lt;P&gt;&lt;TT&gt;numproc OR loadshort | transaction xid | bucket span=10m _time | timechart span=10m sum(numproc) sum(loadshort)&lt;/TT&gt;&lt;/P&gt;

&lt;P&gt;Except we occasionally see multiple data items posted in a given 10min window -- e.g. {host,load1,numproc,time1} and {host,load2,numproc,time2} both land in the same time-bucket.  The above commands adds in the numproc value twice, which obscures the real load on the system.&lt;/P&gt;

&lt;P&gt;Is there a way to dedup the data after its been bucketed?  or, maybe said another way, to dedup the data within a single bucket?&lt;/P&gt;</description>
    <pubDate>Tue, 17 May 2011 00:33:49 GMT</pubDate>
    <dc:creator>jbp4444</dc:creator>
    <dc:date>2011-05-17T00:33:49Z</dc:date>
    <item>
      <title>Bucket/Timechart and Dedup</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103071#M26634</link>
      <description>&lt;P&gt;I'm trying to plot total load-avg vs number of processors in a cluster (i.e. how loaded is the system). The following basically works:&lt;/P&gt;

&lt;P&gt;&lt;TT&gt;numproc OR loadshort | transaction xid | bucket span=10m _time | timechart span=10m sum(numproc) sum(loadshort)&lt;/TT&gt;&lt;/P&gt;

&lt;P&gt;Except we occasionally see multiple data items posted in a given 10min window -- e.g. {host,load1,numproc,time1} and {host,load2,numproc,time2} both land in the same time-bucket.  The above commands adds in the numproc value twice, which obscures the real load on the system.&lt;/P&gt;

&lt;P&gt;Is there a way to dedup the data after its been bucketed?  or, maybe said another way, to dedup the data within a single bucket?&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2011 00:33:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103071#M26634</guid>
      <dc:creator>jbp4444</dc:creator>
      <dc:date>2011-05-17T00:33:49Z</dc:date>
    </item>
    <item>
      <title>Re: Bucket/Timechart and Dedup</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103072#M26635</link>
      <description>&lt;P&gt;Might be easiest to just use &lt;CODE&gt;first()&lt;/CODE&gt; instead, which will give you the most recent &lt;CODE&gt;numproc&lt;/CODE&gt; in each bucket. I'll assume &lt;CODE&gt;numproc&lt;/CODE&gt; doesn't change, though if it did, you might just use &lt;CODE&gt;avg()&lt;/CODE&gt;:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;numproc OR loadshort | transaction xid | bucket span=10m _time | timechart span=10m first(numproc) sum(loadshort)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 17 May 2011 02:15:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103072#M26635</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2011-05-17T02:15:24Z</dc:date>
    </item>
    <item>
      <title>Re: Bucket/Timechart and Dedup</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103073#M26636</link>
      <description>&lt;P&gt;Thanks for the quick reply gkanapathy -- that's definitely in the right direction. But there are multiple hosts producing the {numproc,loadshort} data, and I want to sum each item across all those machines.  My understanding is that 'first' would give only one value from one host.&lt;/P&gt;

&lt;P&gt;Maybe some combination of 'first .. by host' then a separate summation command?&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2011 13:04:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103073#M26636</guid>
      <dc:creator>jbp4444</dc:creator>
      <dc:date>2011-05-17T13:04:33Z</dc:date>
    </item>
    <item>
      <title>Re: Bucket/Timechart and Dedup</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103074#M26637</link>
      <description>&lt;P&gt;Looks like dedup might work after all -- I didn't realize you could dedup based on more than one field:&lt;/P&gt;

&lt;P&gt;numproc OR loadshort | transaction xid | bucket span=10m time | dedup host time | timechart span=10m sum(loadshort) sum(numproc) &lt;/P&gt;

&lt;P&gt;Since bucket discretized the timestamps, the {host,time} pairs are duplicates and dedup can take care of them (time should be underscore-time).&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2011 13:28:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Bucket-Timechart-and-Dedup/m-p/103074#M26637</guid>
      <dc:creator>jbp4444</dc:creator>
      <dc:date>2011-05-17T13:28:18Z</dc:date>
    </item>
  </channel>
</rss>

