Splunk Search

stats first behaving differently in a dashboard to a search - is this a bug.

beano500
Engager

Since upgrading from 5 to 6, one of my dashboards started behaving "strangely", and I have distilled it down to this.

If I have a dashboard that uses "stats" and "first"

<dashboard>
  <label>TimeTest</label>
  <description>TimeTest</description>
  <row>
    <table>
      <searchString> index=_internal
        |stats first(_time) as f, last(_time)  as l by sourcetype
        |eval d = f - l
        |fieldformat f = strftime(f, "%c")
        |fieldformat l = strftime(l, "%c")
        |table sourcetype f l d
      </searchString>
      <earliestTime>-2mon</earliestTime>
      <latestTime>now</latestTime>
    </table>
  </row>
</dashboard>

This produces some strange results for me when I run it I get cases where "first" is further in the past than "last" - giving negative values for 'd'.

If I then click "Open In Search" - the same results are shown (as expected), but is the then click on the magnifying glass to do the search again.......I get sensible values for all with positive values for 'd'

The above case does take a while to run, but it is example of what I am experiencing in a form that others can, hopefully, reproduce.

All help much appreciated.

Tags (2)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

No Report Acceleration. FF, Chrome, IE all behave the same way. It appears it's filling "today" first, ie the current hot bucket on my laptop, and then fills the six days before that oldest-to-newest.
Looking at the network in Chrome, the first preview contains the daily span for Jan 17 and Jan 24. The second contains Jan 17, Jan 20, and Jan 24 (18 and 19 are weekends so no data, that's expected). The next previews add Jan 21, 22, 23 in ascending order.

PS: Standalone development Splunk on my laptop, no search peers racing.

0 Karma

mkinsley_splunk
Splunk Employee
Splunk Employee

This is an interesting observation! There are a few things that can explain this for example: Report Acceleration. In this case we could already have the value, say for 2 days ago calculated. There is no reason to recalculate that value, whereas the value for today , needs to be fully scanned to calculate the value. Also, the UI comes into play. Javascript is single threaded and by the time an event is acted upon, it could have several buckets of data to draw. It would be interesting to look at the results from the preview endpoint in Chrome or FF Net console.

0 Karma

mkinsley_splunk
Splunk Employee
Splunk Employee

first() and last() are literally the first and last records processed by your Search Head.

There is no guarantee on order here. It's literally a race condition on what shows up first from the indexers. If you want to guarantee order, you're going to have to apply a sorting criteria, or use some other ordinal function like min() or max().

The great thing about first() and last() are that they are easy to compute and require almost no memory. Once you start introducing a sort, the search is going to take up more resources (something has to keep an ordered list).

The search you want here is actually the following. It is fast and requires no aggregation. It returns the earliest and latest times grouped by sourcetype. Enjoy!

| metadata type=sourcetypes index=_internal

mkinsley_splunk
Splunk Employee
Splunk Employee

This search does not support your argument. Timechart is quantizing into buckets, representing a large enough set such that last() will be greater than the event at the end of the quantized bin. Events are processed in time reverse order, but because there may be tsidx fragments or multiple indexers there is no strict localized ( small scale) order without an order by function. this is exactly like the law of large numbers.

Check out these links:
http://wiki.splunk.com/Deploy:UnderstandingBuckets
http://docs.splunk.com/Documentation/Splunk/6.0.1/Indexer/HowSplunkstoresindexes

0 Karma

beano500
Engager

I will.

The splunk_app_for_nix code contains searches like
definition = os_index interfaces_sourcetype host=$host$ | streamstats current=f last(TXbytes) as lastTX, last(RXbytes) as lastRX by Name | eval time=_time | strcat Name "-" inetAddr "@" host Interface_Host | eval RX_Thruput_KB = (lastRX-RXbytes)/1024 | eval TX_Thruput_KB = (lastTX-TXbytes)/1024 | timechart eval(sum(TX_Thruput_KB)/dc(time)) by Interface_Host

Which, as far as I can tell, relies on the search returning earliest first - so if you can't rely on the order you receive events, then this is also broken

0 Karma

mkinsley_splunk
Splunk Employee
Splunk Employee

if you think there is a bug, please open a case with support. Based on my understanding of the engine and your sample search, the system is behaving exactly as expected. Good luck!

0 Karma

beano500
Engager

But with one indexer, indexing events that have a gap of days between them (which is the case I was using)the documentation says that a search is ordered.
And not forgetting my original point, that is that the behaviour is DIFFERENT for a dashboard compared to a search bar. This has to be a bug.

0 Karma

mkinsley_splunk
Splunk Employee
Splunk Employee

There is not a guaranteed order here. Imagine you have 3 indexers feeding your search. Whoever is fastest to get the first result back to the SH wins the race.

Data is processed from newest -> oldest. These two things are not mutually exclusive.

0 Karma

beano500
Engager

In the 6.0.1 Documentation "About Retrieving Events". The third paragraphs starts "Events are retrieved from an index(es) in reverse time order. The results of a Splunk search are ordered from most recent to least recent by default."

So I would say that there is a guarantee in order here.

0 Karma

beano500
Engager

So also streamstats() not comes to mind - I have seen many examples in code (e.g. *nix) where there is a reliance on the order of the initial search, as the code remembers to previous value of a field and does a diff (e.g. Tx/Rx bytes to get throughput figures.). If the order of events delivered in non-deterministic (and different between a dashboard and a search window) - will this not cause issues with other searches?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Here's a different use for first: Imagine you're merging several events into one pseudo-transaction with stats. There may be a field, say a username, in some of the events and you know it's always the same within one pseudo-transaction. Using first(username) is the fastest way to select that username, in such a case you don't care which one Splunk selects.

beano500
Engager

Thanks for the clarification, and on re-reading the first() documentation this makes sense.
Though I would say that there seems to be a misconceptions that splunk searches from the newest->oldest - I am sure I read it somewhere (e.g. accepted answers to http://answers.splunk.com/answers/42570/why-stats-last-and-first-are-inverted) and this was the cast pre-6.0. So first() and last() are only really of use when you know the order of your search.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Here's a thought that may be related: I've observed a change in how a 7-day dashboard is filled since upgrading to Splunk 6. In Splunk 5, it would fill sequentially from now back to 7 days ago. In Splunk 6, data appears from the past before that sequential fill arrived there.
That's with a standalone Splunk, so no remote search peer delivering data faster than another peer.

This may or may not affect how first() and last() are calculated.

0 Karma

beano500
Engager

The issue is not restricted to time, the original issue I had was with page counts on printers - I created the problem above with time just to illustrate the issue.
w.r.t value I was getting, it was negative, and more importantly different to the one I got when I opened up the search in a search bar.

0 Karma

linu1988
Champion

What is the value are you getting? i had the same issue while doing same sort of calculation. Splunk doesn't provide you correct time difference as it always calculate on epoch time format.

I had some help earlier
_http://answers.splunk.com/answers/98294/epoch-time-to-conventional-time

0 Karma

beano500
Engager

I have worked around the issue using min/max. But that does not deflect from the fact that is is an issue.
The point being I should receive the same output from a dashboard as a search window and I don't.
Unless there is something I am fundamentally doing wrong, then I guess it is a bug

0 Karma

linu1988
Champion

if you bucket, the stats will have very less data to compare, I have done it in some of my graphs where the granular level details doesn't really affect the final outcome very much.

e.g.consider on 17th you may have 1000 events.
if you do a stats then it will go through each timestamp to findout which is the min and max rather the first and last. But if you bucket it to 1 hours and considering 50 events per hour the stats input will become only 20 events which is faster right? this is what i understand and in my dashboard it takes way less when i compare monthly data.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Slightly off topic, but I'm curious - how does bucket speed that up?

To add something to the topic, you can replace your eval with range(_time) in the stats.

0 Karma

linu1988
Champion

and it should give the negative result as per the query as f < l. Moreover when calculating this big range in the query you can bucket the time which will give your reult bit faster as you are not looking for granular detail.

index=_internal|bucket _time span=6h|stats...

0 Karma

somesoni2
Revered Legend

Try changing "first" with "max" and "last" with "min" function.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...