Archive2

Timestamp offset option for sourcetype that is behind by a few seconds?

Splunk Employee
Splunk Employee

If I have many sourcetypes being indexed in real time, but one of them is off by a five or maybe 30 seconds, is there a setting or option within Splunk that would all me to specify that offset and have the timestamp in the Splunk GUI adjust itself when I search across ALL sourcetypes, such that the time correlations are correctly synced?

If not, can this type of option or setting be included in a future version of Splunk?

Tags (1)
1 Solution

Splunk Employee
Splunk Employee

So for the case where one of your event data sources has clock skew, can this be handled at search time?

Theoretically, you could adjust the _time field, which is Splunk's idea of when event happened. However, there are a few caveats:

  • This would affect how the events are sorted and displayed, but would not affect the initial collection of data off disk. Thus if you had a source which was 5 minute slow, and you searched for the past 4 minutes, you wouldn't see anything from that host.
  • Splunk typically doesn't expect the _time field to get tweaked by search commands, and may do silly things if it's modified in, for example, eval.
  • In the least case you can create a search command, explicitly declare that it modifies _time, and do arbitrary modifications to the _time field based upon your own criteria. You'd have to do your own design work to make it controllable from the search string. For example:

    Splunk> error sourcetype=apache | adjust_time host=hostname +5m

At this point you'd have to be comfortable with writing python and general relatively simple programming.

Where this gets very difficult though, is you probably do not want to keep your sources out of sync. Eventually they sources will either be corrected or drift more, and how can you decide how to do conditional adjusting? You'd have to build a huge table in your python program, and it would quickly become unmanagable.

Here are some counter-ideas:

  1. Sort the events by _indextime -- "splunk> your search | sort _indextime" -- along with any search that has to tweak the _time, this is not going to perform well on lots of data.
  2. Clobber _time with _indextime and then sort: "splunk> your search | eval _time=_indextime | sort _time" ; this will show you the index time in the left side of the events, instead of the time claimed in the events.
  3. Simply run ntpd on your systems. You want this anyway for a host of other reasons.

View solution in original post

Splunk Employee
Splunk Employee

So for the case where one of your event data sources has clock skew, can this be handled at search time?

Theoretically, you could adjust the _time field, which is Splunk's idea of when event happened. However, there are a few caveats:

  • This would affect how the events are sorted and displayed, but would not affect the initial collection of data off disk. Thus if you had a source which was 5 minute slow, and you searched for the past 4 minutes, you wouldn't see anything from that host.
  • Splunk typically doesn't expect the _time field to get tweaked by search commands, and may do silly things if it's modified in, for example, eval.
  • In the least case you can create a search command, explicitly declare that it modifies _time, and do arbitrary modifications to the _time field based upon your own criteria. You'd have to do your own design work to make it controllable from the search string. For example:

    Splunk> error sourcetype=apache | adjust_time host=hostname +5m

At this point you'd have to be comfortable with writing python and general relatively simple programming.

Where this gets very difficult though, is you probably do not want to keep your sources out of sync. Eventually they sources will either be corrected or drift more, and how can you decide how to do conditional adjusting? You'd have to build a huge table in your python program, and it would quickly become unmanagable.

Here are some counter-ideas:

  1. Sort the events by _indextime -- "splunk> your search | sort _indextime" -- along with any search that has to tweak the _time, this is not going to perform well on lots of data.
  2. Clobber _time with _indextime and then sort: "splunk> your search | eval _time=_indextime | sort _time" ; this will show you the index time in the left side of the events, instead of the time claimed in the events.
  3. Simply run ntpd on your systems. You want this anyway for a host of other reasons.

View solution in original post

Super Champion

It really seems like ntpd is still your best bet. The next best would be running ntpdate from a cron job. Synchronized clocks is kind of a foundational assumption (events are stored in reverse time-order after all, so it's a pretty fundamental assumption). All of that aside, it still seems like the tricks that jrodman pointed out should work for you if correcting your clock is not possible for whatever reason.

0 Karma
Reply

Splunk Employee
Splunk Employee

sounds to me like a general case of "look around" an event for related events.

0 Karma
Reply

Splunk Employee
Splunk Employee

No. I mean I have two different sources streaming real time into my Splunk indexer. One contains timestamps that are synced correctly in time and the other contains timestamps that are out of sync by , say, five seconds. This implies that when I search across both sources, they will not be correlated correctly with respect to actual time.

Therefore, what I am asking is, if there was an option that would allow me to adjust source number two by five seconds (offset), such that it sync's the events back up with respect to the correct timestamps contained in source number one.

0 Karma
Reply

Splunk Employee
Splunk Employee

Can you clarify? You mean the timestamps on the data is incorrect or off relative to the rest of the data? Is "real time" just a red herring? Whether something is indexed in real time or is delayed is not going to make a difference in the search, assuming the timestamps are ultimately correct.

Is the fundamental issue that the data is timestamped incorrectly, or is the issue that you need to be able to correlate events across a wider time range because of natural delays in related activity across different logs?

0 Karma
Reply