we have one syslog input where we receive log data from two different sources.
One runs on local time, i.e. CEST, and carries a distinct string "abc", while the other runs on UTC and carries "def".
For some unknown reason the UTC one doesn't carry "UTC" or "+00:00" with it, that information got stripped in transfer. Therefore it is currently off by two hours.
To fix that, I want to pass the "abc" through unchanged, and set "UTC" on the "def", so that it will be correctly displayed at search time.
My experiments with props.conf and transforms.conf (and datetime.xml) were not successful, since once the timezone is set at input time, it seems impossible to change it selectively for "def". Transforming the sourcetype is easy, but then it is too late, and the same applies to the host, so setting the TZ depending on a transformed parameter is not an option.
Any ideas, apart from a conversation with the people who send the broken data?
Thanks in advance
Off topic: I wonder who or what changed the subject of this thread from "One input, two TZ" to "Why when we have One input, we are receiving two TZ?", which has nothing to with the Splunk side problem at all. I think this is not exactly helpful.
@vgrote - Ideally there should be atleast one of the following metadata should be different if there is two different source sending data.
And whichever is different you write timestamp extract on it, differently for both types of logs.
Set required parameters in props.conf for each.
[<spec>] TIME_PREFIX = <regular expression> MAX_TIMESTAMP_LOOKAHEAD = <integer> TIME_FORMAT = <strptime-style format> TZ = <POSIX time zone string>
I hope this helps!!!
@VatsalJagani: thanks for your suggestion. Sadly there is no chance to discriminate by host, source or sourcetype since the data arrives from a single syslog file input source.
The colleagues sending the data already signalled that splitting the data into two streams may be an option, so all will be well soon.
@vgrote - Even though it's coming through Syslog, having common metadata for different types of data is definitely not recommended as it will create problems not only with timestamp but also with the parsing and search-time field extraction.
I would rethink the input stage, and assign proper sourcetype/source value instead of common sourcetype as syslog.
I hope this helps!!!
With syslog, especially with very spread-out infrastructures it's relatively common to have "syslog forwarders" instead of setting up splunk forwarders in each possible location. That's why I advise using a syslog-processing layer before sending the events to splunk. I'm not very experienced with sc4s but with rsyslog I can do wonders - filter, route, distinguish between different kinds of events from a single source, en/decapsulate source IP for sending through chain of syslog forwarders and so on.
As you probably know, the timestamp extraction is done relatively early in the event processing pipeline so there's no way to overwrite source/sourcetype/host and use that value to select proper timestamp parsing rules.
You can use ingest-evals to "correct" the timestamp. Check the example https://conf.splunk.com/files/2020/slides/PLA1154C.pdf - slide 26 onwards.
Having said that - I'm always a preacher of the "leave syslog processing to the syslog daemon" approach since splunk's internal syslog handling capacities are relatively limited or at least very non-obvious to perform. Just use SC4S or rsyslog, adjust your events properly, send them to HEC and Bob's your uncle.