<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Subsearches compairing datasets in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284555#M86060</link>
    <description>&lt;P&gt;I believe we can eliminate first stats altogether (&lt;CODE&gt;| stats count...&lt;/CODE&gt;). Also, he earliest for 0 daysago i.e. @d is inclusive of events exactly at @d, the comparison operator for &lt;CODE&gt;_time&amp;gt;relative_time(now(),"@d")&lt;/CODE&gt; (there is a typo in the relative_time) should be &lt;CODE&gt;&amp;gt;=&lt;/CODE&gt;.&lt;/P&gt;</description>
    <pubDate>Fri, 10 Feb 2017 16:59:25 GMT</pubDate>
    <dc:creator>somesoni2</dc:creator>
    <dc:date>2017-02-10T16:59:25Z</dc:date>
    <item>
      <title>Subsearches compairing datasets</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284553#M86058</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;

&lt;P&gt;I have a search technique I've been using to compare smaller sets of data, to find the difference, however I'm running into the subsearch limit with a new set of data. I'm hoping someone has a good idea for a different way to perform the search that doesn't run into subsearch limits. Here's the situation:&lt;/P&gt;

&lt;P&gt;Each night a system is dumping an *.csv log into a directory which Splunk is monitoring and indexing. The csv is approximately 50k lines, therefor approx 50k events indexed by Splunk. I'm being asked to report each morning on events that exist in today's dump, which didn't exist in the previous day's dump. I've gone to my typical routine below in an attempt to accomplish this, but I'm hitting that 10k subsearch limit. I'm assuming I could up the limit, but, I'd rather have a more efficient search, if possible.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| set union
[search index=&amp;lt;index&amp;gt; sourcetype=&amp;lt;sourcetype&amp;gt; earliest=@d-1d latest=@d | eval daysago=1 | stats count by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; daysago | fields - count]
[search index=&amp;lt;index&amp;gt; sourcetype=&amp;lt;sourcetype&amp;gt; earliest=@d latest=@d+1d | eval daysago=0 | stats count by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; daysago | fields - count]
| stats max(daysago) as daysago by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; | where daysago=0
| eval Details="Has been added in the past day."
| table Details &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I know the logic is sound (I use it for other things), but here the subsearches are just too big.&lt;/P&gt;

&lt;P&gt;Any advice is welcome! Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 16:29:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284553#M86058</guid>
      <dc:creator>adamsmith47</dc:creator>
      <dc:date>2017-02-10T16:29:21Z</dc:date>
    </item>
    <item>
      <title>Re: Subsearches compairing datasets</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284554#M86059</link>
      <description>&lt;P&gt;You can only up the limit to 10,499 so that isn't going to help. The following technique has no limits and will run &lt;EM&gt;much&lt;/EM&gt; faster:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;search index=&amp;lt;index&amp;gt; sourcetype=&amp;lt;sourcetype&amp;gt; earliest=-1d@d
| eval daysago=if(_time&amp;gt;reltime(now,"@d"),daysago=0,daysago=1)
| stats count by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; daysago 
| fields - count
| stats max(daysago) as daysago by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; 
| where daysago=0
| eval Details="Has been added in the past day."
| table Details &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This technique searches the data set only once, then categorizes the results before comparing them.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 16:51:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284554#M86059</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2017-02-10T16:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Subsearches compairing datasets</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284555#M86060</link>
      <description>&lt;P&gt;I believe we can eliminate first stats altogether (&lt;CODE&gt;| stats count...&lt;/CODE&gt;). Also, he earliest for 0 daysago i.e. @d is inclusive of events exactly at @d, the comparison operator for &lt;CODE&gt;_time&amp;gt;relative_time(now(),"@d")&lt;/CODE&gt; (there is a typo in the relative_time) should be &lt;CODE&gt;&amp;gt;=&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 16:59:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284555#M86060</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-02-10T16:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: Subsearches compairing datasets</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284556#M86061</link>
      <description>&lt;P&gt;How about this - &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=&amp;lt;index&amp;gt; sourcetype=&amp;lt;sourcetype&amp;gt; earliest=-1d@d
| bin _time span=1d
| stats min(_time) as mintime max(_time) as maxtime by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt;
| eventstats min(mintime) as yesterdayepoch max(maxtime) as todayepoch
| where mintime=maxtime
| eval myflag=case(mintime==todayepoch,"Added Record",maxtime==yesterdayepoch,"Deleted Record", true(),"Nonesuch Record")  
| eval _time = mintime 
| table _time &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt; myflag
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;updated case tests to use == rather than =&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 17:47:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284556#M86061</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-02-10T17:47:32Z</dc:date>
    </item>
    <item>
      <title>Re: Subsearches compairing datasets</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284557#M86062</link>
      <description>&lt;P&gt;Thank you Ignuinn and somesoni, it's working well!&lt;/P&gt;

&lt;P&gt;The form I've ultimately gone with is:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=&amp;lt;index&amp;gt; sourcetype=&amp;lt;sourcetype&amp;gt; earliest=-1d@d
| eval daysago=if(_time&amp;gt;=relative_time(now(),"@d"),0,1)
| stats max(daysago) as daysago by &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt;
| where daysago=0
| eval Details="Has been added in the past day."
| table Details &amp;lt;field1&amp;gt; &amp;lt;field2&amp;gt; &amp;lt;field3&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 10 Feb 2017 19:06:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Subsearches-compairing-datasets/m-p/284557#M86062</guid>
      <dc:creator>adamsmith47</dc:creator>
      <dc:date>2017-02-10T19:06:29Z</dc:date>
    </item>
  </channel>
</rss>

