My dataset has multiple events for a single _time. Batches get loaded whenever it's sent by a 3rd party. I have 25 unique sites that gets data sent. I have a query which finds the most recent _time for the data by site_code:
index=tgem sourcetype=tgem | stats first(_time) as _time by site_code
It works fine, and returns the expected value, which is 2014-01-12 for the site_code=BR. So far so good, so now I want all matching records for that _time, so I glue it into a subquery like this:
index=tgem sourcetype=tgem [search index=tgem sourcetype=tgem | stats first(_time) as _time by site_code | return 25 _time] | search site_code=BR
It returns all records for 2014-01-04, which is the earliest _time in my period (month to date). Any ideas? I've been pulling my hair out all day.
My requirement is to do some stats (min, max, avg) of a field in the dataset, by site_code, for wahtever the most recent set of data is for that site_code
So in the end two things happened.
The format sting I used was
format "(" "(" "AND" ")" "OR" ")"
Thanks to everyone for your replies.
The first
and last
functions are relative to the order of events - when you are dealing with time, earliest
and latest
are usually better. Also, you need to return the site code as well as the time; you don't need the return
comand.
index=tgem sourcetype=tgem [search index=tgem sourcetype=tgem | stats latest(_time) as _time by site_code ]
Use the search job inspector to see how the subsearch values are returned.
For debugging subsearches you can take a look at the job inspector. That will show you what the subsearch evaluated to, giving you a chance to narrow the problem to either the subsearch or the outer search.
A different thought, merge the site_code filter into the first search command for potentially greater performance - unrelated to the results though.
Use max(_time) instead of first(_time)