<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can you help me using dedup and count? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425821#M122059</link>
    <description>&lt;P&gt;It's going to take a little massaging on your end... You should remove line by line to identify whats breaking the search. I don't have your query in front of me so I can't do it for you.. &lt;/P&gt;</description>
    <pubDate>Wed, 23 Jan 2019 15:24:02 GMT</pubDate>
    <dc:creator>skoelpin</dc:creator>
    <dc:date>2019-01-23T15:24:02Z</dc:date>
    <item>
      <title>Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425815#M122053</link>
      <description>&lt;P&gt;I have the following search based on F5 logs that count the HTTP POSTs by src in a five-minute bucket:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=f5 action!=blocked http_method=POST
| bucket _time span=5m
| stats count by _time, src, website
| sort -count
| stats dc(website) as distinct_website, list(website) as Website, list(count) as count, sum(count) as Total by src 
| where distinct_website &amp;gt;= 3
| sort -Total
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The problem is, I get an output that lists the same websites multiple times:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6440i59ECA7FDE54A392B/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;If I add '| dedup Website' before the 'where' command, I still see the duplicate websites.&lt;/P&gt;

&lt;P&gt;if I add '| dedup website' after the '| stats count by _time, src, website' command, the websites are deduped, but I see the following output that contains a different src and websites altogether:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6441i5B2F7B431D31B37A/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Do I need the dedup command at all for this search?&lt;/P&gt;

&lt;P&gt;Thx&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 14:33:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425815#M122053</guid>
      <dc:creator>jwalzerpitt</dc:creator>
      <dc:date>2019-01-23T14:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425816#M122054</link>
      <description>&lt;P&gt;Try this &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=f5 action!=blocked http_method=POST
 | bucket _time span=5m
 | stats count by _time, src, website
 | sort -count
 | stats dc(website) as distinct_website, list(count) as count, sum(count) as Total by src, website
 | where distinct_website &amp;gt;= 3
 | sort -Total
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 Jan 2019 14:43:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425816#M122054</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-01-23T14:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425817#M122055</link>
      <description>&lt;P&gt;Hi @jwalzerpitt&lt;/P&gt;

&lt;P&gt;Try with  &lt;CODE&gt;values(website)&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 14:51:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425817#M122055</guid>
      <dc:creator>vnravikumar</dc:creator>
      <dc:date>2019-01-23T14:51:49Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425818#M122056</link>
      <description>&lt;P&gt;Thx for the reply and search. Using your suggested search returns the same output as my original search with dupes of websites listed.&lt;/P&gt;

&lt;P&gt;As I think through this is adding the Total potentially causing an issue? Maybe I need to sum the count of individual POSTs against each website first to get rid of the dupes so that &lt;A href="http://www.abc.com"&gt;www.abc.com&lt;/A&gt; has a total count of 45 POSTs, and then &lt;A href="http://www.def.com"&gt;www.def.com&lt;/A&gt; has a total count of 34 POSTs, and so on, with the Total being the aggregate number of POSTs from all websites?&lt;/P&gt;

&lt;P&gt;Or perhaps because I'm running the search against a 24 hour time period with five-minute buckets that it's listing the dupes for the websites because there are POSTs scattered among the five-minute buckets throughout the 24 hour time period for the same website?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 14:55:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425818#M122056</guid>
      <dc:creator>jwalzerpitt</dc:creator>
      <dc:date>2019-01-23T14:55:16Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425819#M122057</link>
      <description>&lt;P&gt;I've edited by original answer. Try that one&lt;/P&gt;

&lt;P&gt;It's difficult not testing this on my end, but in a nutshell, we are counting by website. In your second &lt;CODE&gt;stats&lt;/CODE&gt;, you should have the &lt;CODE&gt;by website&lt;/CODE&gt; clause. &lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:04:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425819#M122057</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-01-23T15:04:37Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425820#M122058</link>
      <description>&lt;P&gt;The edited search returns no results.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:18:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425820#M122058</guid>
      <dc:creator>jwalzerpitt</dc:creator>
      <dc:date>2019-01-23T15:18:56Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425821#M122059</link>
      <description>&lt;P&gt;It's going to take a little massaging on your end... You should remove line by line to identify whats breaking the search. I don't have your query in front of me so I can't do it for you.. &lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:24:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425821#M122059</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-01-23T15:24:02Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425822#M122060</link>
      <description>&lt;P&gt;it's not that the search is broke, just wondering if the search is the most efficient way to find src IPs with a lot of POSTs in a short amount of time.&lt;/P&gt;

&lt;P&gt;I believe the line you suggested, "| stats count values(website) AS website by _time, src"  provides a more efficient search, and I go back to a previous anaswer of mine in which I speculated that I might not be able to get rid of the duplicate websites as the POSTs are happening in different five-minute buckets. &lt;/P&gt;

&lt;P&gt;I'm just trying to figure out how to total the count per website &lt;/P&gt;

&lt;P&gt;Thx for the help&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:30:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425822#M122060</guid>
      <dc:creator>jwalzerpitt</dc:creator>
      <dc:date>2019-01-23T15:30:15Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425823#M122061</link>
      <description>&lt;P&gt;Each field added after the by clause will decrease efficiency. You're correct about better efficiency with the &lt;CODE&gt;values(website)&lt;/CODE&gt; this will provide multiple websites by &lt;CODE&gt;src&lt;/CODE&gt;. the &lt;CODE&gt;src&lt;/CODE&gt; would be deduped while the website field could be multi-valued. If you were to add that &lt;CODE&gt;website&lt;/CODE&gt; field after the by clause on your first stats, then website would be deduped. &lt;/P&gt;

&lt;P&gt;I'm confused, you said the search isn't broke, but you claimed it wasn't returning results. &lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:38:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425823#M122061</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-01-23T15:38:31Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425824#M122062</link>
      <description>&lt;P&gt;My original post showed the results I was getting that had the duplicate websites and I asked if there was a way to get rid of them using the dedup command. The edited search returns no results but rolling back to your first search below returns results:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  | bucket _time span=5m
  | stats count values(website) AS website by _time, src
  | sort -count
  | stats dc(website) as distinct_website, list(website) as Website, list(count) as count, sum(count) as Total by src 
  | where distinct_website &amp;gt;= 3
  | sort -Total
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 Jan 2019 15:44:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425824#M122062</guid>
      <dc:creator>jwalzerpitt</dc:creator>
      <dc:date>2019-01-23T15:44:44Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me using dedup and count?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425825#M122063</link>
      <description>&lt;P&gt;&lt;CODE&gt;index=f5 action!=blocked http_method=POST&lt;BR /&gt;
 | bucket _time span=5m&lt;BR /&gt;
 | stats count by _time, src, website&lt;BR /&gt;
 | sort -count&lt;BR /&gt;
 | stats dc(website) as distinct_website, values(website) as Website, list(count) as count, sum(count) as Total by src &lt;BR /&gt;
 | where distinct_website &amp;gt;= 3&lt;BR /&gt;
 | sort -Total&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jan 2019 22:38:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-you-help-me-using-dedup-and-count/m-p/425825#M122063</guid>
      <dc:creator>janderson19</dc:creator>
      <dc:date>2019-01-23T22:38:39Z</dc:date>
    </item>
  </channel>
</rss>

