I'm working with some syslog data that is being pulled in from a gzip file. The data looks like this
Apr 28 23:59:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:59:01 hostname systemd: Started Session 9904 of user pdw.
Apr 28 23:59:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:58:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:58:01 hostname systemd: Started Session 9903 of user pdw.
Apr 28 23:58:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:57:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:57:01 hostname systemd: Started Session 9902 of user pdw.
Apr 28 23:57:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:56:01 hostname systemd: Removed slice User Slice of pdw.
The issue is instead of seeing April 28 in _time, what I'm seeing is what appears to be timestamp of the file source="/var/log/messages-20220501.gz". The 2,974,360 events in the gzip run from Aug 1 to May 3. Does Splunk not get the date from each event in the gzip or did Splunk run up against a limitation and not process due to the number of events?
Yes, if you have that old events and have no year informations with them splunk will parse them as this year's october or such.
And if you have default MAX_DAYS_HENCE (which means just 3 days) it will discard that information and index the events with current timestamp.
Could the issue be due to the fact that the events that use the file time as _time have dates that are actually from last year, but don't have a year with them so it looks like they are in the future? For example this search
index=os_nix sourcetype=syslog <user>
| rex field=_raw "(?<my_date>\w+\s+\d+\s+\d+\:\d+\:\d+)"
| eval reported_date = strptime(my_date, "%b %d %H:%M:%S")
| eval time_diff = _time - reported_date
| eval abs_time_diff = abs(time_diff)
| eval indexTime = strftime(_indextime, "%b %d %H:%M:%S")
| table _time, indexTime, my_date, time_diff, abs_time_diff
| search abs_time_diff > 7200
outputs information like this
_time indexTime my_date time_diff abs_time_diff
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 18:10:51 15758352.000000 15758352.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 18:10:51 15758352.000000 15758352.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 18:00:59 15758944.000000 15758944.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 18:00:59 15758944.000000 15758944.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 17:11:04 15761939.000000 15761939.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 17:11:04 15761939.000000 15761939.000000
2022-04-25 03:30:03 Apr 24 03:35:46 Oct 24 17:01:32 15762511.000000 15762511.000000
In the above the _time is the file time, indexTime is when it was indexed, my_date is the date in the syslog event, time_diff is the difference per the search, and the abs_time_diff is the absolute diff per the search.
Yes, if you have that old events and have no year informations with them splunk will parse them as this year's october or such.
And if you have default MAX_DAYS_HENCE (which means just 3 days) it will discard that information and index the events with current timestamp.
@PickleRickThanks. That is what I was looking for. Is there something in the documentation that I missed while searching for this answer?
Dunno. I've simply read the props.conf documentation so many times... 😄
The timestamp may simply not be recognized. If your sourcetype has a time format definition that is not consistent with the actual timestamp, the time parsing will fail and Splunk will resort to fallback methods.
Make sure your sourcetype has proper timestamp parsing settings (time format, max timestamp lookahead, timezone). See the props.conf documentation for details.
While it might be tempting to delete all those settings and let Splunk figure it out, it's a great performance boost to define the settings properly.
Thanks for answering PickleRick. I wish it was that simple, but this is the syslog stanza
This is what “syslog” looks like to the indexers:
/data/splunk/hot/apps/splunk/etc/slave-apps/Splunk_TA_cisco-asa/local/props.conf [syslog]
/data/splunk/hot/apps/splunk/etc/system/default/props.conf DATETIME_CONFIG = /etc/datetime.xml
/data/splunk/hot/apps/splunk/etc/system/default/props.conf MAX_TIMESTAMP_LOOKAHEAD = 32
/data/splunk/hot/apps/splunk/etc/system/default/props.conf TIME_FORMAT = %b %d %H:%M:%S
Since the TIME_FORMAT matches the events that are in the gzip file, I don't know why they wouldn't get processed.