<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Selecting log entry having smallest field value in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236982#M70436</link>
    <description>&lt;P&gt;The toughest problem is to know when to reset; that is to know when the car passes by a stop and then you need to start tracking of the next pass. In your example, how do you know that these three values are from the "same time it passed by" and not the first one being "the closest on the first time", the middle one being the "closest on the second time" and the third one to be "the closes on the third time"?&lt;/P&gt;</description>
    <pubDate>Wed, 24 Aug 2016 05:40:29 GMT</pubDate>
    <dc:creator>mIliofotou_splu</dc:creator>
    <dc:date>2016-08-24T05:40:29Z</dc:date>
    <item>
      <title>Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236981#M70435</link>
      <description>&lt;P&gt;Suppose I have log data like this:&lt;/P&gt;

&lt;PRE&gt;2016-08-24 03:46:15 GMT vehicle_id="1075" vehicle_distance=145 stop_tag="5687"
...
2016-08-24 03:46:52 GMT vehicle_id="1075" vehicle_distance=19 stop_tag="5687"
...
2016-08-24 03:47:38 GMT vehicle_id="1075" vehicle_distance=47 stop_tag="5687"&lt;/PRE&gt;

&lt;P&gt;For a given vehicle, it shows its distance to the closest stop over time as data is transmitted occasionally by the vehicle. I want to select only those log entries where, for a particular vehicle/stop pair, the distance is the smallest. For this sample data, I'd want &lt;EM&gt;only&lt;/EM&gt; the middle entry because it has a distance of 19 that is the smallest among 19, 47, and 145.&lt;/P&gt;

&lt;P&gt;Note that the "..." above means that there are other log entries intermixed having different vehicles and their closest stops.&lt;/P&gt;

&lt;P&gt;I want this "smallest distance" log entry &lt;EM&gt;every&lt;/EM&gt; time the vehicle passes by the stop. For example, if the vehicle passed by the stop again two hours later (because it's done a complete loop of its route), I &lt;EM&gt;still&lt;/EM&gt; want the "smallest distance" from the first time it passed by &lt;EM&gt;and&lt;/EM&gt; the "smallest distance" from the second time it passed by --- and so on for every time it passes by the stop. The multiple "clusters" of vehicle/stop pairs should be considered independently when finding each "cluster's" smallest distance.&lt;/P&gt;

&lt;P&gt;BTW: a "cluster" of log entries for a given vehicle/stop pair could have &lt;EM&gt;any&lt;/EM&gt; number of entries (but probably small, i.e., less than, say, 5). If there's only one entry, obviously that's the one with the smallest distance.&lt;/P&gt;

&lt;P&gt;How can I do this?&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 04:15:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236981#M70435</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-24T04:15:11Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236982#M70436</link>
      <description>&lt;P&gt;The toughest problem is to know when to reset; that is to know when the car passes by a stop and then you need to start tracking of the next pass. In your example, how do you know that these three values are from the "same time it passed by" and not the first one being "the closest on the first time", the middle one being the "closest on the second time" and the third one to be "the closes on the third time"?&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 05:40:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236982#M70436</guid>
      <dc:creator>mIliofotou_splu</dc:creator>
      <dc:date>2016-08-24T05:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236983#M70437</link>
      <description>&lt;P&gt;Here's a rough crude approach:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | streamstats current=f global=f window=2 first(vehicle_distance) as vehicle_distance_1 last(vehicle_distance) as vehicle_distance_2 by vehicle_id stop_tag | where vehicle_distance_2 &amp;lt;= vehicle_distance_1 AND vehicle_distance_2 &amp;lt;= vehicle_distance
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That's assuming each vehicle-stop-instance triple has a distinct trough of distance. This will copy over two distances from two adjacent events for the vehicle-stop pair, and only keep those with that trough. The event you're actually looking for is the one next to this event, you may need to copy over additional values such as time if you need more than vehicle, stop, and shortest distance.&lt;BR /&gt;
Getting the exact event is more work for Splunk because you'd have to do &lt;CODE&gt;| streamstats ... | reverse | streamstats ...&lt;/CODE&gt; to copy over one value from neighbouring events on both sides.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 05:52:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236983#M70437</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-24T05:52:40Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236984#M70438</link>
      <description>&lt;P&gt;There are a couple of ways: (1) the log would contain the vehicle at other stops in the interim; (2) the timestamp of separate clusters would be far apart in time.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 19:09:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236984#M70438</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-24T19:09:37Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236985#M70439</link>
      <description>&lt;P&gt;If by "trough" you mean a distance pattern of I, J, K where J is less than either I or K, then no. There could be any (small) number of log entries for a vehicle/stop pair. I've updated my original question to reflect this. Will your solution still work?&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 20:15:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236985#M70439</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-24T20:15:20Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236986#M70440</link>
      <description>&lt;P&gt;If the assumption doesn't hold then all things relying on the assumption won't work.&lt;/P&gt;

&lt;P&gt;I'd question the data's usefulness then though. In my mind, the minimum amount of data to determine a closest distance would be:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;vehicle is some distance away&lt;/LI&gt;
&lt;LI&gt;vehicle is even closer&lt;/LI&gt;
&lt;LI&gt;vehicle is further away again&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;The second would be the trough of distance, and the one you'd be looking for. If you don't even have this minimum amount of data you should probably post up an anonymized full sample set.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 20:18:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236986#M70440</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-24T20:18:58Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236987#M70441</link>
      <description>&lt;P&gt;The data for your #3 &lt;EM&gt;may&lt;/EM&gt; not be there because the vehicles report their GPS location only occasionally. It could easily be the case that the next time the vehicle reports its location that it's actually closer to it's next stop.&lt;/P&gt;

&lt;P&gt;It could also happen the other way around, i.e., the first time the vehicle reports its closest stop, it could be, say, 20 feet away; but the next time it reports its closest stop, it could be, say, 55 feet away.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 02:31:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236987#M70441</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-25T02:31:22Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236988#M70442</link>
      <description>&lt;P&gt;BTW: I &lt;EM&gt;could&lt;/EM&gt; write an external program to filter the results. The algorithm in Python is:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;log_d = { }
for line in args.in_file:
    new_d = parse( line ) # parse into timestamp &amp;amp; key/value pairs
    vid = new_d[ K_VID ]
    if vid not in log_d:
        log_d[ vid ] = new_d
    else:
        old_d = log_d[ vid ]
        old_stop = old_d[ K_STAG ]
        new_stop = new_d[ K_STAG ]
        if new_stop == old_stop:
            if new_d[ K_VDISTANCE ] &amp;lt; old_d[ K_VDISTANCE ]:
                log_d[ K_VID ] = new_d
        else:
            log_vehicle_at_stop( old_d, args.out_file )
            log_d[ vid ] = new_d
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But I'd rather have &lt;EM&gt;all&lt;/EM&gt; the data inside Splunk and use Splunk to filter it if possible.&lt;/P&gt;

&lt;P&gt;Or I could take the Python program and make a custom streaming search command out of it. But doing it "native" would probably still be better.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 14:40:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236988#M70442</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-25T14:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236989#M70443</link>
      <description>&lt;P&gt;Native doesn't necessarily mean better, do go along the custom command route if you can express your problem in python.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 16:31:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236989#M70443</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-25T16:31:24Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236990#M70444</link>
      <description>&lt;P&gt;Again, I &lt;EM&gt;could&lt;/EM&gt;; but, as I said, then I wouldn't have all the original data indexed just in case I ever need it for anything.&lt;/P&gt;

&lt;P&gt;Also, it would be nice to learn more SPL to know how to do this in Splunk directly.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 16:36:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236990#M70444</guid>
      <dc:creator>plucas_splunk</dc:creator>
      <dc:date>2016-08-25T16:36:06Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting log entry having smallest field value</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236991#M70445</link>
      <description>&lt;P&gt;Index all the data, then run your custom python command over it at search time.&lt;/P&gt;

&lt;P&gt;I'm sure there are plenty Splunky ways to do this natively, e.g. streamstats with some fancy resetting, but I don't understand enough of your requirements to continue.&lt;BR /&gt;
For example, you mentioned a timeout kinda thing... yet, I don't see anything about that in your pseudocode.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 17:40:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Selecting-log-entry-having-smallest-field-value/m-p/236991#M70445</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-25T17:40:03Z</dc:date>
    </item>
  </channel>
</rss>

