If I have a basic input which sets the sourcetype, configuring a timezone offset works great:
[monitor:///path/to/foo.log] sourcetype = foo
[foo] TZ = GMT
If I have to setup flexible sourcetyping, the above configuration does not work:
[source::...foo.log] TRANSFORMS-abc = setFooSourcetype, setBarSourcetype [foo] TZ = GMT
I'm guessing the timezone is set before the new sourcetype is applied so that is why the TZ parameter is not honored. So I then tried to set the timezone offset using host, which also does not work (no matter the ordering of the stanzas):
[source::...foo.log] TRANSFORMS-abc = setFooSourcetype [host::foohost*] TZ = GMT
What other options do I have? I'm not sure where in the indexing pipeline metadata is set.
TZ parameter is used and set by the parsing pipeline: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F
and I believe that timestamp processing occurs before any TRANSFORMS. Setting
TZ with a
source:: stanza should work (as it does with a sourcetype), so I'm not sure why it isn't working for you. Is there another stanza that might also be matching and overriding?
I think that is precisely the problem--"timestamp processing occurs before any TRANSFORMS." This data stream requires a sourcetype and host override. Is there a way to re-process TZ after the sourcetype and host override? Can I create my own processor for this or edit the pipeline order for the parsing queue?
More details: all events are streamed via TCP by a syslog server, except the events are not in syslog format. So we need to add index-time rules to assign sourcetype and host.
oh, okay, yes if you're setting host in props/transforms, then no stanza referring to them is going to work (by that host). Are all events from multiple hosts coming in from that same syslog server? Is it possible to split the hosts from different time zones to different syslog servers or different TCP ports?
Thank you, G. We are still in the eval stage here so are reluctant to make changes like this in the production environment. The preference is to configure Splunk to handle this case if possible. Are you recommending we do not alter the parsing pipeline?
You really want to declare the TZ by host, ideally, since the logs are almost certainly generated by some system in either its localtime or in GMT.
However, sometimes life is more complicated than that, like syslog, where the host is identified via a transform, and the original host (used during timestamp extraction) is going to just be where we're acquiring the data.
In this case you're going to just have to use a source pattern to get reasonable behavior, with the hosts split out into files, for example by syslog-ng.
Of course if you have the option of simply altering the timestamp format to include the timezone, that's really ideal for ALL parties, not just Splunk.
If the timezone is declared at time of parsing, it is stored. However, the date parsing code essentially passes the buck for localtime to the system libc, so doesn't know it. Getting the offset is some work. Getting the timezone is pretty hard. However, this sounds like work worth doing. It would be very helpful if you could file an enhancement request with support with one or two use cases to add color to the need.
In the current release of Splunk, in the exact scenario described above, it is not possible to apply TZ when using host/sourcetype overriding. The only alternative is to have the specific hosts forward directly to Splunk so as to create a dedicated source.