In the process of trying to verify some summary index data I've noticed that timechart does not seem to return expected results when using the earliest and latest functions.
Example data:
indextime _time Value
1438019839 2015-07-27 11:03:27 173755
1438019838 2015-07-27 11:03:10 173755
1438019838 2015-07-27 11:03:09 173755
1438019836 2015-07-27 11:03:05 173750
1438019838 2015-07-27 11:02:46 173750
1438019834 2015-07-27 11:02:29 173750
1438019833 2015-07-27 11:02:28 173750
1438019834 2015-07-27 11:02:24 173747
1438019834 2015-07-27 11:01:56 173747
1438019832 2015-07-27 11:01:39 173747
1438019834 2015-07-27 11:01:39 173747
1438019832 2015-07-27 11:01:33 173727
1438019832 2015-07-27 11:01:15 173727
1438019831 2015-07-27 11:00:58 173727
1438019832 2015-07-27 11:00:56 173727
1438019831 2015-07-27 11:00:52 173717
1438019831 2015-07-27 11:00:32 173717
1438019831 2015-07-27 11:00:14 173717
1438019831 2015-07-27 11:00:13 173717
1438019831 2015-07-27 11:00:09 173712
I've included indextime as I thought it might be relevant. But note that sorting by indextime does not change the earliest and latest values.
Running a timechart using earliest and latest against this data yields results which are clearly incorrect.
| timechart span=1d earliest(Value) as earliestValue, latest(Value) as latestValue, max(Value) as maxValue, min(Value) as minValue
_time earliestValue latestValue maxValue minValue
2015-07-27 173755 173755 173755 173712
While stats produces the correct output...
| stats earliest(Value) as earliestValue, latest(Value) as latestValue, max(Value) as maxValue, min(Value) as minValue
earliestValue latestValue maxValue minValue
173712 173755 173755 173712
Interestingly, using first and last inplace of latest and earliest with timechart does produce the correct output.
| timechart span=1d last(Value) as earliestValue, first(Value) as latestValue, max(Value) as maxValue, min(Value) as minValue
_time earliestValue latestValue maxValue minValue
2015-07-27 173712 173755 173755 173712
I've searched through the docs and can't find any mention of why this could be occurring. I presume there is some internal reason why timechart functions this way, but it's very counter-intuative and not at all clear. Does anyone know why the earliest and latest functions work this way with timechart?
Running Splunk 6.2.4 on Oracle Enterprise Linux 6.5.
Update:
Results of | table _time Value per @somesoni2's request.
_time Value
2015-07-27 11:03:27 173755
2015-07-27 11:03:10 173755
2015-07-27 11:03:09 173755
2015-07-27 11:02:46 173750
2015-07-27 11:03:05 173750
2015-07-27 11:01:39 173747
2015-07-27 11:02:24 173747
2015-07-27 11:02:29 173750
2015-07-27 11:01:56 173747
2015-07-27 11:02:28 173750
2015-07-27 11:01:33 173727
2015-07-27 11:01:39 173747
2015-07-27 11:01:15 173727
2015-07-27 11:00:56 173727
2015-07-27 11:00:58 173727
2015-07-27 11:00:14 173717
2015-07-27 11:00:13 173717
2015-07-27 11:00:52 173717
2015-07-27 11:00:32 173717
2015-07-27 11:00:09 173712
It should help to consider how bucketing works for timechart (read the dox on bucket, AKA bin). When you tell timechart to bucket with span=1d, Splunk modifies every event's _time value and changes it (for this search) from whatever it used to be to 0d@d which is exactly at midnight: 00:00:00.000. Once this has happened, it may be unknown/undefined/unpredictable how any version of Splunk will select a single "winner" for "earliest" when all events for "today" now have exactly the same timestamp. It should be that timechart calculates earliest and latest before it modifies _time but perhaps there is a reason that it cannot. IMHO, the situation is either a code bug or a documentation bug (not mentioning this aspect) so I would open a support ticket.
But I have 1 caveat: if you are bucketing twice in a row (e.g. ... | bucket _time span=1h ... | timechart span=1h earlirlest(value) ...) then you absolutely cannot fault Splunk for being unable to get the right answer because the bucket changes to _time mean that the timechart has no reliable reference point to break the ties correctly. Are you doing 2 bucketing commands like this?
It should help to consider how bucketing works for timechart (read the dox on bucket, AKA bin). When you tell timechart to bucket with span=1d, Splunk modifies every event's _time value and changes it (for this search) from whatever it used to be to 0d@d which is exactly at midnight: 00:00:00.000. Once this has happened, it may be unknown/undefined/unpredictable how any version of Splunk will select a single "winner" for "earliest" when all events for "today" now have exactly the same timestamp. It should be that timechart calculates earliest and latest before it modifies _time but perhaps there is a reason that it cannot. IMHO, the situation is either a code bug or a documentation bug (not mentioning this aspect) so I would open a support ticket.
But I have 1 caveat: if you are bucketing twice in a row (e.g. ... | bucket _time span=1h ... | timechart span=1h earlirlest(value) ...) then you absolutely cannot fault Splunk for being unable to get the right answer because the bucket changes to _time mean that the timechart has no reliable reference point to break the ties correctly. Are you doing 2 bucketing commands like this?
The bucketing is an excellent thought and this seems likely to be the cause of the issue. In further testing, if I add a "| sort + Value" before the timechart the output changes...
_time earliestValue latestValue maxValue minValue
2015-07-27 173712 173712 173755 173712
If bucketing were not the issue (e.g. timestamps has not been modified before the earliest and latest functions run), then the sort would have no effect on the timechart output.
Can you try putting a " | table _time Value" before the timechart and see the result?
I've added results per your request. I assume you wanted to see the table output as it's returned from Splunk (rather than being sorted by time). I also tested adding the table before the timechart, but this had no effect on the timechart output.