Would someone kindly confirm if Splunk is expected to preserve the order of events as they are presented in the original log file during indexing? If it is not, is there a setting to force it to preserve order?
We are observing events being indexed out of original order when there is no sub-second resolution on the timestamp and hundreds/thousands of events are generated per second.
Events are persisted to disk in their original arrival order. Events are retrieved in inverse time order, with inverse arrival order used to break ties.
The "code" of the event, stored in the field _cd
stores the pair (bucket id, arrival address)
. You can sort by ascending arrival address to see events in arrival order.
Events are persisted to disk in their original arrival order. Events are retrieved in inverse time order, with inverse arrival order used to break ties.
The "code" of the event, stored in the field _cd
stores the pair (bucket id, arrival address)
. You can sort by ascending arrival address to see events in arrival order.
Arrival order is in-file order, for any given source.
Thank you, Stephen. If the log file is being collected by a Splunk forwarder, what is the relation between arrival order and original file order? Can we at least expect these to be the same per source? A customer is reporting the original order is not preserved, but I am not yet able to reproduce on a standalone Splunk instance without a forwarder.
This may not directly answer your question, but it's related:
Thank you guys. In our case, I do not believe there are more than several thousand events per second--way less than the 100k limit.
It seemed to me like these topics were related, but perhaps they are not. It seems like splunk generally does preserve ordering, so I guess I just figured that splunk assigned sequential values _cd
or something, and things start breaking after so many hundreds of thousands of events on a single second.... but I could be way off. Perhaps these are not related at all.
I think the issue is that there are several events (say just a couple of thousand) with the same timestamp (no subseconds) in the same file, and they want to know if Splunk will return results in the order in which were encountered in the file. My guess is that there is no such guarantee.