Getting Data In

Does Splunk not get the date from each event in the gzip?

jwhughes58
Contributor

I'm working with some syslog data that is being pulled in from a gzip file.  The data looks like this

 

 

Apr 28 23:59:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:59:01 hostname systemd: Started Session 9904 of user pdw.
Apr 28 23:59:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:58:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:58:01 hostname systemd: Started Session 9903 of user pdw.
Apr 28 23:58:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:57:01 hostname systemd: Removed slice User Slice of pdw.
Apr 28 23:57:01 hostname systemd: Started Session 9902 of user pdw.
Apr 28 23:57:01 hostname systemd: Created slice User Slice of pdw.
Apr 28 23:56:01 hostname systemd: Removed slice User Slice of pdw.

 

 

 The issue is instead of seeing April 28 in _time, what I'm seeing is what appears to be timestamp of the file source="/var/log/messages-20220501.gz".  The 2,974,360 events in the gzip run from Aug 1 to May 3.  Does Splunk not get the date from each event in the gzip or did Splunk run up against a limitation and not process due to the number of events?

Labels (3)
Tags (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Yes, if you have that old events and have no year informations with them splunk will parse them as this year's october or such.

And if you have default MAX_DAYS_HENCE (which means just 3 days) it will discard that information and index the events with current timestamp.

View solution in original post

jwhughes58
Contributor

Could the issue be due to the fact that the events that use the file time as _time have dates that are actually from last year, but don't have a year with them so it looks like they are in the future?  For example this search

index=os_nix sourcetype=syslog <user>
| rex field=_raw "(?<my_date>\w+\s+\d+\s+\d+\:\d+\:\d+)" 
| eval reported_date = strptime(my_date, "%b %d %H:%M:%S") 
| eval time_diff = _time - reported_date 
| eval abs_time_diff = abs(time_diff) 
| eval indexTime = strftime(_indextime, "%b %d %H:%M:%S") 
| table _time, indexTime, my_date, time_diff, abs_time_diff
| search abs_time_diff > 7200

outputs information like this

_time	                indexTime	my_date	        time_diff	abs_time_diff
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 18:10:51	15758352.000000	15758352.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 18:10:51	15758352.000000	15758352.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 18:00:59	15758944.000000	15758944.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 18:00:59	15758944.000000	15758944.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 17:11:04	15761939.000000	15761939.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 17:11:04	15761939.000000	15761939.000000
2022-04-25 03:30:03	Apr 24 03:35:46	Oct 24 17:01:32	15762511.000000	15762511.000000

In the above the _time is the file time, indexTime is when it was indexed, my_date is the date in the syslog event, time_diff is the difference per the search, and the abs_time_diff is the absolute diff per the search.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes, if you have that old events and have no year informations with them splunk will parse them as this year's october or such.

And if you have default MAX_DAYS_HENCE (which means just 3 days) it will discard that information and index the events with current timestamp.

jwhughes58
Contributor

@PickleRickThanks.  That is what I was looking for.  Is there something in the documentation that I missed while searching for this answer?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Dunno. I've simply read the props.conf documentation so many times... 😄

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The timestamp may simply not be recognized. If your sourcetype has a time format definition that is not consistent with the actual timestamp, the time parsing will fail and Splunk will resort to fallback methods.

Make sure your sourcetype has proper timestamp parsing settings (time format, max timestamp lookahead, timezone). See the props.conf documentation for details.

While it might be tempting to delete all those settings and let Splunk figure it out, it's a great performance boost to define the settings properly.

0 Karma

jwhughes58
Contributor

Thanks for answering PickleRick.  I wish it was that simple, but this is the syslog stanza

This is what “syslog” looks like to the indexers:

/data/splunk/hot/apps/splunk/etc/slave-apps/Splunk_TA_cisco-asa/local/props.conf                 [syslog]
/data/splunk/hot/apps/splunk/etc/system/default/props.conf                                       DATETIME_CONFIG = /etc/datetime.xml
/data/splunk/hot/apps/splunk/etc/system/default/props.conf                                       MAX_TIMESTAMP_LOOKAHEAD = 32
/data/splunk/hot/apps/splunk/etc/system/default/props.conf                                       TIME_FORMAT = %b %d %H:%M:%S

Since the TIME_FORMAT matches the events that are in the gzip file, I don't know why they wouldn't get processed.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...