I am working on some Splunk searches that highly rely on the order the events are returned in, by the search command.
I searched a lot of docs to find some quote that says I can rely on this order, but it seems there is no such guarantee.
So for every search, I need to pre-sort the results by
| sort 0 -_time to be confident about the order. This introduces a big performance-penalty because the searches operate on large datasets. Is there a statement about this, I missed somewhere? And what does your experience tell about relying on the order?
EDIT: The root of this question is the use of the streamstats command as a hack to replace the non-existent by-clause for the delta operator. So it looks something like
... | streamstats current=f window=1 global=f first(_time) as prev_time by object_id, upon this I can calculate the time-delta for events of each distinct object. It does not matter if I do it forward or backward, I could build the query either way, but in both ways this is highly dependant on the order beeing correct. It does not matter if the order is stable in this case.
It seems that, based on DalJeanis's answer, especially in clustered deployments the default order can not be guaranteed. I would be very grateful if someone could provide a definitive answer.
When thinking about this issue, also remember that in any general search you might be searching across multiple indexes and multiple hosts, so you are going to pay some performance penalty as soon as you give a non-streaming command (any command that assumes all the data is together to be summed or compared), since the data all has to be collected/merged together in order to accomplish the non-streaming command.
If you posted some specific snippets from your searches that rely on the order, then we might be able to give you some other answer than, "Splunk is SUPPOSED TO..."
By default, Splunk displays your search results in reverse chronological order (from most recent to oldest). If there are multiple events with same timestamp, then the order of those events (only) can't be guaranteed, It'll show the events which it received from search peers.
Can you share an example of events which are not being sorted reverse chronologically by default?
Ideally Splunk gives you latest indexed data as the first result (similar to what you achieve by sort - _time command. If you have too much volume of data causing delay in data being indexed and event timestamp to be incorrect then you have a graver issue of non synchronized data. You should look at index buckets (specially hot bucket) to understand how Splunk data is indexed and retrieved during search and why latest data gets pulled up first, while older records take longer time to be fetched.
thanks for your reply! Its not that I receive any out-of-order-data, my question is rather if it could happen, that under certain (maybe very unlikely) circumstances the order might be broken? I will look at your link, although it looks familar to me, maybe I will be able to find the answer there, thanks!
Well if they are not out of order and you are injesting data with proper timestamp you can rely on the fact that newest record will be displayed first. Even if you perform sort - _time your result will be in the same sequence.
You can export the results from two queries with sort - _time and without sort piped to search and compare the results, they should ideally be in the same sequence.