<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Filtering on multiple fields based on a stats subsearch in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325344#M97018</link>
    <description>&lt;P&gt;I have multiple log sources that are appended on a daily basis. All rows in one refresh have same epoch time. I would like to select all values from each log source based on last epoch time. &lt;BR /&gt;
I get the last collection epoch by following stats&lt;BR /&gt;
         sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime  by source &lt;/P&gt;

&lt;P&gt;This gives me a list of epoch value with each source. &lt;BR /&gt;
Next, I am trying to filter based on above as follows&lt;BR /&gt;
sourcetype=my_sourcetype |rename epochtime as lasttime | join type=inner lasttime, source[ search sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime  by source ]&lt;/P&gt;

&lt;P&gt;Expecting that it would filter like SQL filter where a.lasttime=b.lasttime and a.source=b.source but it seems like I get for each epoch, multiple events from the main query. &lt;/P&gt;

&lt;P&gt;How do I change this query to filter based on the results in stats?&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 16:24:52 GMT</pubDate>
    <dc:creator>saboobaker</dc:creator>
    <dc:date>2020-09-29T16:24:52Z</dc:date>
    <item>
      <title>Filtering on multiple fields based on a stats subsearch</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325344#M97018</link>
      <description>&lt;P&gt;I have multiple log sources that are appended on a daily basis. All rows in one refresh have same epoch time. I would like to select all values from each log source based on last epoch time. &lt;BR /&gt;
I get the last collection epoch by following stats&lt;BR /&gt;
         sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime  by source &lt;/P&gt;

&lt;P&gt;This gives me a list of epoch value with each source. &lt;BR /&gt;
Next, I am trying to filter based on above as follows&lt;BR /&gt;
sourcetype=my_sourcetype |rename epochtime as lasttime | join type=inner lasttime, source[ search sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime  by source ]&lt;/P&gt;

&lt;P&gt;Expecting that it would filter like SQL filter where a.lasttime=b.lasttime and a.source=b.source but it seems like I get for each epoch, multiple events from the main query. &lt;/P&gt;

&lt;P&gt;How do I change this query to filter based on the results in stats?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:24:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325344#M97018</guid>
      <dc:creator>saboobaker</dc:creator>
      <dc:date>2020-09-29T16:24:52Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering on multiple fields based on a stats subsearch</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325345#M97019</link>
      <description>&lt;P&gt;Okay, so first, go read this, so you understand how to think about Splunk queries.  Thinking in terms of SQL will result in extremely inefficient code in Splunk, every time.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://answers.splunk.com/answers/561130/sql-analogy-for-my-log-search-join-two-tables-wher.html"&gt;https://answers.splunk.com/answers/561130/sql-analogy-for-my-log-search-join-two-tables-wher.html&lt;/A&gt;&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;Second, if you just want the last event of each source, then use this and you are done...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=my_sourcetype 
| dedup source
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Explanation - &lt;CODE&gt;dedup&lt;/CODE&gt; picks the first record of each combination of keys.  In this case, the only key given is &lt;CODE&gt;source&lt;/CODE&gt;, so it picks the first record of each source.&lt;/P&gt;

&lt;P&gt;By default, Splunk returns the most recent records first.  Thus, the most recent of each will be chosen.&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;Third, if there is a chance that some source has stopped being sent, and the most recent date will not have any of that source, and you want to get rid of that source because the date doesn't match, then do this...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=my_sourcetype 
| dedup source
| eventstats max(epochtime) as lasttime 
| where epochtime = lasttime
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Explanation: &lt;CODE&gt;eventstats&lt;/CODE&gt; is like &lt;CODE&gt;stats&lt;/CODE&gt;, except it does not delete the underlying events, it just copies the results onto every event that matches the keys.  In this case, we don't need any key, because all the events from the most recent load have the same &lt;CODE&gt;epochtime&lt;/CODE&gt; value.  If any remaining event does NOT have the same value, then it means it is a &lt;CODE&gt;source&lt;/CODE&gt; that is left over from a prior load.&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;Fourth, here's a free bonus method that will sometimes (or even often) be more efficient than &lt;CODE&gt;dedup&lt;/CODE&gt;.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=my_sourcetype 
| stats latest(epochtime) as epochtime latest(someotherfield) as someotherfield by source
| eventstats max(epochtime) as lasttime 
| where epochtime = lasttime
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Explanation: within a &lt;CODE&gt;stats&lt;/CODE&gt;-type command, &lt;CODE&gt;latest(x)&lt;/CODE&gt; will pick the value of x from the record with the highest value for &lt;CODE&gt;_time&lt;/CODE&gt;.  &lt;CODE&gt;Stats&lt;/CODE&gt; can be more efficient in some cases because it can be partially calculated at the indexes, limiting the amount of data that gets transferred to the search head.  &lt;/P&gt;

&lt;P&gt;Test this version against the third version and inspect the results to see which one is more efficient in your particular situation. &lt;/P&gt;</description>
      <pubDate>Wed, 25 Oct 2017 00:55:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325345#M97019</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-10-25T00:55:25Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering on multiple fields based on a stats subsearch</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325346#M97020</link>
      <description>&lt;P&gt;Thank you for the answer with an alternate solution. &lt;/P&gt;

&lt;P&gt;However, I was looking for help on Join and why does the join stated in question does not work. Is the join with more than one fields not supported? Or is it something else incorrect in the way I am joining the results.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 18:19:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325346#M97020</guid>
      <dc:creator>saboobaker</dc:creator>
      <dc:date>2017-10-31T18:19:16Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering on multiple fields based on a stats subsearch</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325347#M97021</link>
      <description>&lt;P&gt;@saboobaker - I gave you all those because this structure is extremely inefficient.  I don't see any issues with the theory, though.  &lt;/P&gt;

&lt;P&gt;First, put white space between the &lt;CODE&gt;source[&lt;/CODE&gt; just in case the subsearch does not return a starting space.  If that works, you are done.&lt;/P&gt;

&lt;P&gt;Second, check to see whether you are running out of time or space.  That should not be the case, since the subsearch is only returning two fields.  Nonetheless, it is a theoretical possibility.  Look at the job inspector and look for language about the job being complete or incomplete, and how many events were scanned and how many returned.  If that looks normal, then check the search log to see what happened there.  &lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 21:11:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Filtering-on-multiple-fields-based-on-a-stats-subsearch/m-p/325347#M97021</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-10-31T21:11:39Z</dc:date>
    </item>
  </channel>
</rss>

