Getting Data In

Is it possible to preserve original order of events?

hulahoop
Splunk Employee
Splunk Employee

Would someone kindly confirm if Splunk is expected to preserve the order of events as they are presented in the original log file during indexing? If it is not, is there a setting to force it to preserve order?

We are observing events being indexed out of original order when there is no sub-second resolution on the timestamp and hundreds/thousands of events are generated per second.

Tags (1)
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

Events are persisted to disk in their original arrival order. Events are retrieved in inverse time order, with inverse arrival order used to break ties.

The "code" of the event, stored in the field _cd stores the pair (bucket id, arrival address). You can sort by ascending arrival address to see events in arrival order.

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

Events are persisted to disk in their original arrival order. Events are retrieved in inverse time order, with inverse arrival order used to break ties.

The "code" of the event, stored in the field _cd stores the pair (bucket id, arrival address). You can sort by ascending arrival address to see events in arrival order.

Stephen_Sorkin
Splunk Employee
Splunk Employee

Arrival order is in-file order, for any given source.

0 Karma

hulahoop
Splunk Employee
Splunk Employee

Thank you, Stephen. If the log file is being collected by a Splunk forwarder, what is the relation between arrival order and original file order? Can we at least expect these to be the same per source? A customer is reporting the original order is not preserved, but I am not yet able to reproduce on a standalone Splunk instance without a forwarder.

0 Karma

Lowell
Super Champion

This may not directly answer your question, but it's related:

hulahoop
Splunk Employee
Splunk Employee

Thank you guys. In our case, I do not believe there are more than several thousand events per second--way less than the 100k limit.

0 Karma

Lowell
Super Champion

It seemed to me like these topics were related, but perhaps they are not. It seems like splunk generally does preserve ordering, so I guess I just figured that splunk assigned sequential values _cd or something, and things start breaking after so many hundreds of thousands of events on a single second.... but I could be way off. Perhaps these are not related at all.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I think the issue is that there are several events (say just a couple of thousand) with the same timestamp (no subseconds) in the same file, and they want to know if Splunk will return results in the order in which were encountered in the file. My guess is that there is no such guarantee.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...