<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why is stats count producing incorrect results on joined data? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137782#M37789</link>
    <description>&lt;P&gt;I don't know if this is your problem, but please note that count(DEVICE_ID) does not count the number of results which have the field DEVICE_ID.  Instead it counts the number of occurrences (or values, if you prefer) of the field DEVICE_ID, which can disagree in the case of multivalue fields.  You may wish to check if you get the same number for &lt;CODE&gt;|where DEVICE_ID="*" | stats count by date&lt;/CODE&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 28 Sep 2020 17:40:50 GMT</pubDate>
    <dc:creator>jrodman</dc:creator>
    <dc:date>2020-09-28T17:40:50Z</dc:date>
    <item>
      <title>Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137781#M37788</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;

&lt;P&gt;I'm trying to join data from two sourcetypes and make some simple statistics based on joined data. But when I'm using stats I have some incorrect results. Let me explain in details:&lt;/P&gt;

&lt;P&gt;In sourcetype "ATM" I have field DEVICE_ID and in sourcetype "Zabbix"  I have fields JNAME and AVAILABLE. Fields DEVICE_ID and JNAME contains the same sort of data from two different applications and fields AVAILABLE contain information for which I need some statistics. In sourcetype "ATM" I have events for more DEVICE_ID's than in sourcetype "Zabbix"&lt;/P&gt;

&lt;P&gt;Basically my task is to count for how much DEVICE_ID's I don't have information in "Zabbix" every day. &lt;BR /&gt;
I used following query to get the result:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=ATM | eval date=strftime(_time,"%d-%m-%Y") | dedup DEVICE_ID, date
| join type=left DEVICE_ID, data [search sourcetype=Zabbix | eval date=strftime(_time,"%d-%m-%Y") | dedup JNAME, date | rename JNAME AS DEVICE_ID]
| fillnull value="No data" AVAILABLE
| search AVAILABLE="No data"
| stats count(DEVICE_ID) by date
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When I'm executing this search for 7 days, I get correct information for latest 5 and fully incorrect for two first. When I executed this search for last 30 days I got correct result for latest 11 days and incorrect for all other. Like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;date | count
11-09-2014 | 2994
12-09-2014 | 2989
13-09-2014 | 347
14-09-2014 | 328
15-09-2014 | 341
16-09-2014 | 349
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Please help me to make this search working properly. Thank you in advance.&lt;/P&gt;

&lt;P&gt;P.S. I'm using Splunk Free if it matter&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:40:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137781#M37788</guid>
      <dc:creator>ArsenyKapralov</dc:creator>
      <dc:date>2020-09-28T17:40:33Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137782#M37789</link>
      <description>&lt;P&gt;I don't know if this is your problem, but please note that count(DEVICE_ID) does not count the number of results which have the field DEVICE_ID.  Instead it counts the number of occurrences (or values, if you prefer) of the field DEVICE_ID, which can disagree in the case of multivalue fields.  You may wish to check if you get the same number for &lt;CODE&gt;|where DEVICE_ID="*" | stats count by date&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:40:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137782#M37789</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2020-09-28T17:40:50Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137783#M37790</link>
      <description>&lt;P&gt;Clerical error?&lt;BR /&gt;
 join type=left DEVICE_ID, &lt;STRONG&gt;data&lt;/STRONG&gt; &lt;BR /&gt;
-&amp;gt; join type=left DEVICE_ID, &lt;STRONG&gt;date&lt;/STRONG&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:40:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137783#M37790</guid>
      <dc:creator>HiroshiSatoh</dc:creator>
      <dc:date>2020-09-28T17:40:52Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137784#M37791</link>
      <description>&lt;P&gt;&lt;STRONG&gt;jrodman&lt;/STRONG&gt;&lt;BR /&gt;
Thank you for your comment. But |where DEVICE_ID="*" produce no results. Also I'm sure that numbers provided by my search are correct for some dates, so I think this doesn't affect my case&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;HiroshiSatoh&lt;/STRONG&gt;&lt;BR /&gt;
Yes, just and error. It doesn't affect my case&lt;/P&gt;

&lt;P&gt;I also found that subsearch in join was limited by time. When I tried to change limits (in limits.conf) it starts to produce correct results for more dates. So, I think this is the case and I need to understand how I can speedup search and avoid incorrect results in general.&lt;/P&gt;

&lt;P&gt;Any ideas?&lt;/P&gt;</description>
      <pubDate>Wed, 24 Sep 2014 11:59:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137784#M37791</guid>
      <dc:creator>ArsenyKapralov</dc:creator>
      <dc:date>2014-09-24T11:59:41Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137785#M37792</link>
      <description>&lt;P&gt;A couple of things:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Keep in mind that subsearches return a maximum of 10,500 results, so some of your results could be truncated.&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;You might be able to refactor the search such that a join is not needed, which could greatly increase the efficiency of the search.  Would a search like this accomplish what you need?&lt;/P&gt;

&lt;P&gt;sourcetype=ATM  OR sourcetype=Zabbix &lt;BR /&gt;
| eval date=strftime(_time,"%d-%m-%Y") &lt;BR /&gt;
| rename JNAME AS DEVICE_ID &lt;BR /&gt;
| fillnull value="No data" AVAILABLE &lt;BR /&gt;
| fields DEVICE_ID sourcetype date&lt;BR /&gt;
| search AVAILABLE="No data"&lt;BR /&gt;
| stats count(Device_ID) as DeviceCount dc(sourcetype) as SourceTypeCount by date &lt;BR /&gt;
| where SourceTypeCount&amp;gt;1&lt;/P&gt;&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:40:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137785#M37792</guid>
      <dc:creator>wpreston</dc:creator>
      <dc:date>2020-09-28T17:40:59Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137786#M37793</link>
      <description>&lt;OL&gt;
&lt;LI&gt;Thanks for this data. As I understand when I'm using "join" by default Splunk returns 50k results and I can manage this in limits.conf&lt;/LI&gt;
&lt;LI&gt;I tested your query but I get only zeros in DeviceCount if I'm trying to search without "where" and get "No results found" error if I'm searching with "where" clause. Any ideas how to fix this?&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 24 Sep 2014 13:38:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137786#M37793</guid>
      <dc:creator>ArsenyKapralov</dc:creator>
      <dc:date>2014-09-24T13:38:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137787#M37794</link>
      <description>&lt;P&gt;Ah, that's because I didn't include it in my &lt;CODE&gt;stats&lt;/CODE&gt; command, sorry about that.  Here's the edited search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=ATM  OR sourcetype=Zabbix  
| eval date=strftime(_time,"%d-%m-%Y")  
| rename JNAME AS DEVICE_ID  
| fillnull value="No data" AVAILABLE  
| fields DEVICE_ID sourcetype date 
| search AVAILABLE="No data" 
| stats count(Device_ID) as DeviceCount dc(sourcetype) as SourceTypeCount by date DEVICE_ID
| where SourceTypeCount&amp;gt;1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Sep 2014 14:38:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137787#M37794</guid>
      <dc:creator>wpreston</dc:creator>
      <dc:date>2014-09-24T14:38:38Z</dc:date>
    </item>
    <item>
      <title>Re: Why is stats count producing incorrect results on joined data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137788#M37795</link>
      <description>&lt;P&gt;Thanks for your help, but it's still not working. I tried to analyze search results without &lt;CODE&gt;stats&lt;/CODE&gt; and &lt;CODE&gt;where&lt;/CODE&gt; and found following:&lt;BR /&gt;
1) after &lt;CODE&gt;| search AVAILABLE="No data"&lt;/CODE&gt; I have events only of one sourcetype - ATM&lt;BR /&gt;
2) I see DEVICE_ID field in fields list but when I'm trying to do simple stats  &lt;CODE&gt;| stats count(Device_ID) by date&lt;/CODE&gt; search provides zero results each day&lt;/P&gt;

&lt;P&gt;Any ideas why this can happen and what to do?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:41:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-is-stats-count-producing-incorrect-results-on-joined-data/m-p/137788#M37795</guid>
      <dc:creator>ArsenyKapralov</dc:creator>
      <dc:date>2020-09-28T17:41:54Z</dc:date>
    </item>
  </channel>
</rss>

