Hey, We are having some difficulties getting accurate timestamping on files with the same names, which are being fowarded from multiple servers to a single indexer. We have differently formatted timestamps on various files with the same names scattered across the servers each called stderr.log, stdout.log etc. They are all being recognized as dates in the future (ie 7/1/11)
IE (each line is an example from a differently formatted stdout.log located on a different machine)
<Wednesday January 5 17:15:20 EST 2011> : 05/01/2011 17:15:20 <DispatchThread> DEBUG [MarketImpl] Processing BI9 heartbeat
<Wednesday January 5 17:18:03 EST 2011> : <5/01/2011 05:18:03 PM EST> <Debug> <HTTP> <BEA-101147> <HttpServer(12101878,n……..
1/05/11 04:40:37 PM EST [INFO] [IntroscopeAgent] Activating JMX Data Collection
<5/01/2011 04:40:23 PM EST> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.>
I have a feeling if I could ignore the “EST” string it would work, splunk thinks EST is US EST not Australian EST (were we are located). Do you have any ideas how I could accomplish this given that they all logs have the same file name.
I noticed that there is the option to prefix & lookahead, is there anyway to do a postfix (ie max position).
Cheers
First, can you set a different sourcetype for each of these inputs? On the production servers, set the sourcetype for the inputs. For example, if you had the following in inputs.conf on a forwarder:
#inputs.conf
[monitor:///opt/myLogFolder]
Then Splunk would automatically monitor and forward data from all the files in myLogFolder. You could set the sourcetype for individual files in props.conf on the forwarder
#props.conf
[source::/opt/myLogFolder/stdout.log]
sourcetype=mySourcetype
(Do this for each individual file that needs a new sourcetype.)
Now, on the indexer(s), where timestamps are actually parsed, you can specify timestamp processing based on the sourcetype. Again, use props.conf
#props.conf
[mySourceType]
TIME_PREFIX=\<
TIME_FORMAT=%a %b %d %H:%M:%S
The way I specified this matches the first line of your example. It skips the < and then looks at the timestamp, but skips the timezone (and year). Splunk will take the current year if not specified in the timestamp, so this is probably okay if you are not loading older data. You would need to make a similar stanza for each sourcetype.
Look at the manual page Configure timestamp recognition for more timestamp settings.
Finally, if all these inputs are actually variations of the same type of data, you could create individual sourcetypes, but name them similarly. For exampe, mySourceType-1
, mySourceType-2
, etc. Then you could easily search them all by specifying sourcetype=mySourceType*
in your search string.