I'm not sure if I am misunderstanding the use case for the partial flag with timechart or if maybe something else is going on. I thought that if I set partial to false, then any results over partial time spans (first and/or last essentially) would be dropped from the timechart. But I'm seeing inconsistent results with it. Sometimes it seems to be work, but other times it doesn't.
Is anyone else seeing a similar issue with timechart? Or am I doing something wrong? Or maybe I don't fully understand how it is supposed to work? We are on Splunk 6.3.4 running on RHEL 6.5.
The partial option in the timechart is not based on whether the data is "partial" - rather it is based on whether the time period for a result is partial. For example - a search of "the last 24 hours" actually searches the range "earliest=-24h@h latest=now".
If you pipe the search results into "| timechart span=1h" - the results will automatically be "binned" into 1-hour buckets. The earliest bucket will cover an entire hour, but the latest will not (unless you run the search exactly at the beginning of the hour!)
There is no way for Splunk to know whether the data that was collected between 01:00 and 02:00 is complete or not - perhaps nothing happened during that time period. But Splunk can definitely determine if a time span represents a full hour, or a full 5-minutes or whatever you have chosen.
Note that there is always a time span for a bin/bucket - if you don't specify one, the timechart command will use a default. And the span=X does not override the partial setting - the two settings work together.
If you want to test this, run a search on the internal index like this
index=_internal earliest=-4h@h latest=now | timechart span=1h count
Run the search a couple of times over a half hour. Note that the last data point is usually lower that then others - a lot lower at 10 minutes past the hour. Finally, run the search with the partial option:
index=_internal earliest=-4h@h latest=now | timechart span=1h partial=f count
Notice that the last bucket no longer appears. Finally, there is a more efficient way to do this - if you don't want a partial hour at the end of the search, why retrieve the data to begin with?
index=_internal earliest=-4h@h latest=@h | timechart span=1h count
The "latest" stops at the beginning of the hour. This will be more efficient than using the partial option and achieves the same result.
Thanks for the reply lguinn. If you look at my original screenshot though, would you expect to see the first last and buckets to show with parital set to false? The timeframe was from 6:51 to 7:51 and a span of 5m....but both the 6:50 and 7:50 bucket were returned.
We are now on 6.5.2, and I haven't taken a chance to see if I'm still seeing this sporadically odd behavior.
Based on your timerange (6:51:00.000 to 7:51:41.000) - I would not expect to see the bins for
6:50 - 6:55
7:50 - 7:55
Weird that they show up. It might be worth opening a support ticket to find out if this is a known problem for your version of Splunk.
Workaround: Only search for timeranges that you want to appear in the results. Then you won't need partial=f. In the timerange selector, you can choose "Advanced" and specify the exact timerange that you want. Or you can use earliest and latest as I did in one example.
I apologize for not carefully checking the time range of the results in your answer. My first answer was from an environment where I couldn't really see the JPEG.
Seems my previous comment was lost. I repeat it, sorry for possible double in result.
Iguinn, thank for the answer. But does it mean, that I can use «partial» only when I do search with «latest» command in search parameters?
In other words, why cannot I use «partial» for request like this?:
source="*users.csv" | timechart span=month count as users
I do this search in a file which contains data from 2012-06-19 14:53:41 to 2016-06-16 05:39:34. So from business domain point of view «2012-06» bin is a partial one.
At the same time timechart creates the first bin for the period from «2012-06-01 00:00:00» to «2012-07-01 00:00:00», and it seems that from timechart’s point of view this bin is not partial.
So again - is it possible to use «partial» option in my situation?
Yes, see the previous comment. In my answer, I used earliest and latest to make my search time range explict, since the timerange selector is a separate button on the search screen.
You can use partial=t regardless of whether you use earliest or latest or the timerange selector. But the actual effect of the partial=t depends on the bin that the timechart command uses and the overall timerange of the search.
Iguinn, thanks for the answer, but it turned out that I don't understand is it possible for me to use "partial" in my situation. Could you please comment this:
I do this request:
source="*users.csv" | timechart span=day count as users
A file I work with contains data from 2012-06-19 14:53:41 to 2016-06-16 05:39:34. Timechart does bins of 1 days long AND the boundaries of every bean are from 00:00:00 of a the day and 00:00:00 of the next day. E.g. the boundaries for the first bin are "2012-06-19 00:00:00 to 2012-06-20 00:00:00", according to UI of the Splunk (please see the screenshot ).
And this time boundaries are set regardless of the time of the day I make a request at.
The same situation is for month span-lengh AND for hour span-lengh (I mean that the start and the end of a bin are from appropriate "zero level" of a bean, e.g. for hour it is 00:00).
In my situation the intuitive behaviour of "partial" would be just cut off the first and last bins. Because, base on the data in a file, I suspect that it is possible that not all the data for that day are in the file, and for month span I'm sure that it is the case.
Yes, the bins in Splunk are fixed: the 5-minute bins break at :05, :10, :15, etc. For a span=1day, the bins break at midnight. You cannot change this "bin" behavior.
There is nothing that Splunk can do about missing data in the file.
Look carefully at the actual time range of your search - when you run the search, it appears in text above your results. See my last comment - yes, there is something weird about your results based on the timerange that you have highlighted...
I'm willing to bet that the "span=5m" is overriding the "partial=false". Instead of span-5m try using bucket or bin prior to timechart like this ... | bin _time span=5m | ... or maybe since it says "partial time BINS" maybe you have to use bins instead of span in the timechart command.
Hope this helps!
As far as I understand:
1. "Use the bin command for only statistical operations that the chart and the timechart commands cannot process." - that's said in doc for "bin" command.
2. Bin command itself doesn't have partial option.
3. Bin option in timechart command specifies only the number of resulting beans, nothing else.
I understood the principle of "partial" working exactly as maciep, and I don't see anything opposite in documentation. The only suspicious thing is that it is said in the documentation that «Only the first and last bin can be partial.». As far as I understand, the first and the last bins have always be consider as partial. Otherwise the system has to have some logic to understand where the first or/and last bins are partial or not. And IMHO this is impossible, because the software cannot understand whether e.g. the first period is incomplete, or there just really weren’t any events during the first part of this first bean.
There was a quite optimistic post in 2014 - https://answers.splunk.com/answers/143069/how-does-partial-true-affects-timechart-results.html.
But for me it doesn’t work.
The same issue is here - https://answers.splunk.com/answers/432520/why-is-timechart-partialfalse-still-returning-part.html
So it would be great to solve that issue.