<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is it possible to do a conditional stats values command in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409782#M118207</link>
    <description>&lt;P&gt;Hi there!&lt;/P&gt;

&lt;P&gt;I edited my original question to clarify - but field_A and field_B are in different events, so I don't think the above will work? I've tried a few different ways of putting the eval within the values command, but I think that ultimately is comparing fields within an event - is that the correct way to think about using eval inside of stats?&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 20:55:34 GMT</pubDate>
    <dc:creator>twhite_splunk</dc:creator>
    <dc:date>2020-09-29T20:55:34Z</dc:date>
    <item>
      <title>Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409779#M118204</link>
      <description>&lt;P&gt;A common usecase I run into is I want to join two sources of data together only if fields meet certain criteria. The common pattern is this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="A" OR source="B"
| stats values(field_from_A) as field_from_A values(field_from_B) as field_from_B by common_id
| mvexpand field_from_A
| mvexpand field_from_B
| where field_from_A &amp;gt; field_from_B
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The nature of this data is that the stats output has very large mv fields, but the where filter removes  most if them. Because of that, it'd be great if I could have that where filter done by the indexers.&lt;/P&gt;

&lt;P&gt;Is there a way to conditionally do stats values across events?&lt;/P&gt;

&lt;P&gt;Edit adding more details:&lt;/P&gt;

&lt;P&gt;The problem I'm trying to solve is taking normalized event streams and denormalizing them. Below is an example of the kind of flow I'm working with:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourceA - {time: 2018-08-01, username: jim, email: jim@email.com, userid: 582}
sourceB - {time: 2018-08-02, action: purchase, item: pen, userid: 582}
sourceB - {time: 2018-08-03, action: purchase, item: paper, userid: 582}
sourceA - {time: 2018-08-10, username: jim, email: james@email.com, userid: 582}
sourceB - {time: 2018-08-15, action: purchase, item: paper, userid: 582}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then a question that would be asked is to display email and items purchased over the past 30 days.&lt;/P&gt;

&lt;P&gt;The above is a simplified example, and in the real data there are a lot more fields in both the A and B streams, and they both get updated independently of each other (plus also more streams).&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 01:05:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409779#M118204</guid>
      <dc:creator>twhite_splunk</dc:creator>
      <dc:date>2018-08-16T01:05:31Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409780#M118205</link>
      <description>&lt;P&gt;hello there,&lt;BR /&gt;
what is the use case? from your query, seems like you are dealing with numerical fields only (&lt;CODE&gt;&amp;gt;&lt;/CODE&gt;).&lt;BR /&gt;
maybe you can first &lt;CODE&gt;|bin&lt;/CODE&gt; and then add the &lt;CODE&gt;_time&lt;/CODE&gt; field after the &lt;CODE&gt;by&lt;/CODE&gt; clause. that will reduce the &lt;CODE&gt;mv&lt;/CODE&gt; plenty.&lt;BR /&gt;
something like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;   (index = "A" source="A") OR (index = "B" source="B")
     | bin span=5m _time
     | stats values(field_A) as field_A values(field_B) as B by common_id _time
     | mvexpand field_A
     | mvexpand field_B
     | where field_A &amp;gt; field_B
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;also, maybe use &lt;CODE&gt;search&lt;/CODE&gt; instead of &lt;CODE&gt;where&lt;/CODE&gt;&lt;BR /&gt;
this answer explains the difference:&lt;BR /&gt;
&lt;A href="https://answers.splunk.com/answers/50659/whats-the-difference-between-where-and-search-in-the-pipeline.html"&gt;https://answers.splunk.com/answers/50659/whats-the-difference-between-where-and-search-in-the-pipeline.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 03:22:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409780#M118205</guid>
      <dc:creator>adonio</dc:creator>
      <dc:date>2018-08-16T03:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409781#M118206</link>
      <description>&lt;P&gt;How about this?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | stats values(eval(if(field_A&amp;gt;field_B,field_A,""))) as field_A_gt_field_B by common_id
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Or &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | stats values(eval(if(field_A&amp;gt;field_B,field_A,null()))) as field_A_gt_field_B by common_id
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 16 Aug 2018 08:59:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409781#M118206</guid>
      <dc:creator>jkat54</dc:creator>
      <dc:date>2018-08-16T08:59:26Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409782#M118207</link>
      <description>&lt;P&gt;Hi there!&lt;/P&gt;

&lt;P&gt;I edited my original question to clarify - but field_A and field_B are in different events, so I don't think the above will work? I've tried a few different ways of putting the eval within the values command, but I think that ultimately is comparing fields within an event - is that the correct way to think about using eval inside of stats?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 20:55:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409782#M118207</guid>
      <dc:creator>twhite_splunk</dc:creator>
      <dc:date>2020-09-29T20:55:34Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409783#M118208</link>
      <description>&lt;P&gt;Thank you for the suggestion about &lt;CODE&gt;| bin&lt;/CODE&gt; !&lt;/P&gt;

&lt;P&gt;Unfortunately, the sources won't necessarily have a reasonable time overlap to assume. The problem I'm trying to solve is taking normalized event streams and denormalizing them. Below is an example of the kind of flow I'm working with:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourceA - {time: 2018-08-01, username: jim, email: jim@email.com, userid: 582}
sourceB - {time: 2018-08-02, action: purchase, item: pen, userid: 582}
sourceB - {time: 2018-08-03, action: purchase, item: paper, userid: 582}
sourceA - {time: 2018-08-10, username: jim, email: james@email.com, userid: 582}
sourceB - {time: 2018-08-15, action: purchase, item: paper, userid: 582}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then a question that would be asked is to display email and items purchased over the past 30 days.&lt;/P&gt;

&lt;P&gt;The above is a simplified example, and in the real data there are a lot more fields in both the A and B streams, and they both get updated independently of each other (plus also more streams).&lt;/P&gt;

&lt;P&gt;Thank you again!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 13:00:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409783#M118208</guid>
      <dc:creator>twhite_splunk</dc:creator>
      <dc:date>2018-08-16T13:00:33Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409784#M118209</link>
      <description>&lt;P&gt;so, couple of things here&lt;BR /&gt;
looking at very large sets of data for long periods of time can be heavy on resources regardless.&lt;BR /&gt;
for this particular usecase, although you can accomplish with &lt;CODE&gt;|join&lt;/CODE&gt; command or like you approach it, &lt;CODE&gt;| stats values&lt;/CODE&gt; i will suggest summary index, have it run every day and then youll have very little event to consider when looking back 30 days&lt;BR /&gt;
another way to simplify your search will be with a lookup.&lt;BR /&gt;
i would probably run something like this: &lt;CODE&gt;... source = sourceA ... | fields ... username userid email ...| stats values(username) as username values(email) as email values(userid) as userid valus(...) as ...&lt;BR /&gt;
| outputlookup all_my_clients.csv&lt;/CODE&gt;&lt;BR /&gt;
now you will run a simple search &lt;BR /&gt;
&lt;CODE&gt;... source = sourceB ... userid = * | lookup .... OUTPUT ... | stats values(item)  by email&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;hope it helps&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 14:19:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409784#M118209</guid>
      <dc:creator>adonio</dc:creator>
      <dc:date>2018-08-16T14:19:14Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409785#M118210</link>
      <description>&lt;P&gt;Ah, thank you for the follow up. Summary indexing "chunks" has been my general approach, but I was hopeful I had missed a way to compare events before they've been grouped.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 16:11:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409785#M118210</guid>
      <dc:creator>twhite_splunk</dc:creator>
      <dc:date>2018-08-16T16:11:10Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to do a conditional stats values command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409786#M118211</link>
      <description>&lt;P&gt;TL;DR&lt;/P&gt;

&lt;P&gt;@twhite - now that I've fully read your example use case, there is a better option.  This is a job for a simple &lt;CODE&gt;streamstats&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;So, assuming that you want the username and email that are the most recent prior ones for any given purchase, we have this..&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; source="A" OR source="B"
| fields userid email username time action item
| sort 0 userid _time source

| rename COMMENT as "Roll the data from the user record to the purchase record"   
| streamstats  last(username) as username last(email) as email by userid

| rename COMMENT as "Now keep only the purchase records"   
| where  source="B"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That's it.  Make sure the field names are spelled and capitalized correctly, and that the _time field exists.  Also, make sure that if the times are simultaneous, that the purchase record will end up second, not first.&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;Okay, we believe that if you clarified your use case, that we'd be able to help you develop a much better way of getting what you want.  &lt;/P&gt;

&lt;P&gt;Here is our statement of what your code would do - &lt;/P&gt;

&lt;P&gt;For each value of &lt;CODE&gt;field_from_b&lt;/CODE&gt; in &lt;CODE&gt;index=b&lt;/CODE&gt;, find all records in &lt;CODE&gt;index=a&lt;/CODE&gt; with a matching &lt;CODE&gt;common_id&lt;/CODE&gt; that have values of &lt;CODE&gt;field_from_a&lt;/CODE&gt; that are greater than the &lt;CODE&gt;field_from_b&lt;/CODE&gt;, and report those values.&lt;/P&gt;

&lt;P&gt;The following code would achieve that with less RAM than the multiple-mvexpand implementation...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; source="A" OR source="B"
| eval compare_value=case(index="A",field_from_a, index="B",field_from_b)
| sort 0 common_id - compare_value 
| streamstats values(eval(case(index="A",field_from_a))) as greater_as by common_id
| where index="B" and isnotnull(greater_as)
| rename greater_as as field_from_a
| mvexpand field_from_a
| where field_from_a &amp;gt; field_from_b
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Brief explanation:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;Streamstats&lt;/CODE&gt; sees only those records that have gone before, so we sort in descending &lt;CODE&gt;compare_value&lt;/CODE&gt; order.  Records which have come before are greater than or equal to the current record.  &lt;/P&gt;

&lt;P&gt;We copy the values from &lt;CODE&gt;index="a"&lt;/CODE&gt; that are greater (ie before) the current value from &lt;CODE&gt;index="b"&lt;/CODE&gt;.  (Incidentally, if it happens to be a record in &lt;CODE&gt;index="a"&lt;/CODE&gt;, it gets all the prior values as well, but we are only keeping the &lt;CODE&gt;index="b"&lt;/CODE&gt; record after the next test, and we are only keeping them if they had at least one &lt;CODE&gt;index="a"&lt;/CODE&gt; record that was greater. &lt;/P&gt;

&lt;P&gt;The final test is just in case a record has an equal value.  Your use case was strictly greater, so we enforce that then.&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;updated &lt;CODE&gt;sort&lt;/CODE&gt; to &lt;CODE&gt;sort 0&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Aug 2018 16:21:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-do-a-conditional-stats-values-command/m-p/409786#M118211</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2018-08-20T16:21:12Z</dc:date>
    </item>
  </channel>
</rss>

