I'm seeing a number of messages like this in my internal
splunkd log. I'm specifically looking for an explanation on what's going on here or what the messages means exactly. (More specifically, I want to know if this message means that event(s) are being dropped by splunk?)
10-12-2010 12:53:32.619 WARN DateParserVerbose - The TIME_FORMAT specified is matching timestamps (Tue Oct 12 12:53:32 2010) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE.
Please note the dates in this event. The timestamp splunk doesn't like is within the same second as this warning message's timestamp. There are quite a number of these messages in the
splunkd log, and a large number of them a rejecting events that are within a 10 second window or less. I found this on a test splunk instance, but started looking on our production system and found out that quite a number of hosts are generating this kind of message. (I also see some examples where splunk it legitimately dropping some bogus dates; which is correct.)
For the record, I only have a couple of sourcetype setup with either of the
MAX_DAYS_* settings. The most restrictive setting in the entire system is
MAX_DAYS_AGO = 180, and
MAX_DAYS_HENCE = 2. Even the most restrictive setting would still be in terms of "DAYS", right? Not seconds. And if somehow one of these was set to "0", then I would be seeing nearly all of my event being dropped, not just some of them, right?
I'm curious. Try running this search on your system and let me know how many results you get. In the last 7 days, I'm seeing 130 results and all but 5 of them have a
date_diff of 5 seconds or less. The other 5 are off by a few hours.
index="_internal" sourcetype="splunkd" DateParserVerbose MAX_DAYS_* "outside of the acceptable time window" | rex "matching timestamps \((?<ts>[^)]+)\)" | eval ts2=strptime(ts, "%a %b %d %T %Y") | eval time_diff=round((_time-ts2)/60,1) | where abs(time_diff)<=1440 | table _time, ts, time_diff, host
To get a total count, add
| bucket time_diff bins=10 | stats count by time_diff
Give this search a try and see if this is happening on your system to? Add your results to a comment.
Primary indexer: Ubuntu 8.04.4 (32-bit), Splunk 4.1.5
Forwarders: Mostly running 4.1.3 or higher, one or two still running 4.0.9ish
I forward all
_internal messages, so I can see on the indexer that this is happening on a variety of forwarders (all heavy weight). Most of the forwarders are running Windows 2k3, we have a couple on 2k8R2 servers, and a handful of other Linux systems. I originally found this on a Win2k8R2 splunk install that I'm still doing some testing on (I have not yet enabled forwarding, it's running as a standalone demo.)
All Unix systems are synced via NTP; and the windows hosts are either synced via NTP directly, or via the NTP mechanism within Active Directory. We've had some issues with clocks getting out of sync in the past (the domain controller was out of sync, Whoops!), so we've become much more proactive in monitoring this now. I have several alerts setup to monitor various time sync. If an NTP server gets off by more than 128ms, I get an alert. I also monitor forwarder activity on the
_internal index; this alert let me know when the most recent event is too far in the past (forwarder is down), or in the future (forwarder clock is out of sync). The bottom line is that I don't believe we have a time sync issue here.
The event would not have been dropped by splunk. However, it may have gotten indexed with an incorrect timestamp.
This is certainly a weird problem. A few questions:
What version of splunk are we talking about? 4.1.5?
What OS are you running on?
Do you have an idea what data source is causing this? It would be interesting if it was a file that had a large discontinuity in its timestamps or something like that.
Is the machine that generated this message receiving forwarded events from other splunk hosts? If so, are the clocks on those machines set approximately correctly?
Do the files that splunk is watching have modification times that look approximately correct? Another possible place we could get confused is if this was far out of range. For instance, This can happen if you're reading a file from NFS and the file server's clock is very inaccurate.
I've updated by question with basic OS versions and time-sync info. (3) I'm not sure which sourcetype, the splunkd message doesn't give you much to work with. (5) It's possible there are some random files with weird mtimes, but if that were the case, I don't think I would be seeing a fairly even distribution of these events over a period of weeks, would I? I would expect to see that kind of thing appear in clumps, and right after a new source was added, or something like that. As far as remote files, I have one CIFS share we monitor files from, everything else is strictly local files only.
I know this is quite old, but did you ever find a solution to this? I've got the exact same issue where Splunk is saying that the parsed timestamps are outside of the acceptable time range, even though they are within a couple of seconds of the current time. I've tried setting the TIME_FORMAT but no luck. Any ideas?