Getting Data In

What's the best way to handle multiple different time formats in the same source?

spl_aficionado
Path Finder

Hello Splunk Community,

My team is currently processing logs from a single source that can contain events with different timestamp formats. We are debating the best configuration approach and would like input from the community.

Option 1: Using props.conf with transforms (Current setup)

We are currently using a TRANSFORMS-split rule in our props.conf file to differentiate the source types based on some criteria, and then applying a single TIME_FORMAT within each resulting source type stanza. This involves creating several dedicated source types for essentially the same data stream.

Configuration Snippet Example:

[xxx:tomcat9:catalina1]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
TIME_FORMAT=%d-%b-%Y %H:%M:%S
TIME_PREFIX=^
MAX_TIMESTAMP_LOOKAHEAD=20
TRANSFORMS-split = catalina1, catalina2

[tomcat9stdout1]
REGEX = \d+\-[a-zA-Z]+-\d+\s\d+\:\d+\:\d+
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::xxx:tomcat9:stdout1

[tomcat9stdout2]
REGEX = [0-9]{4}\-\d+\-\d+\s\d+\:\d+\:\d+
... (etc for other formats)

Option 2: Using datetime.xml

The alternative approach suggested is to use a single source type and configure a datetime.xml file. This XML file would contain multiple regular expressions, allowing Splunk to iterate through them and automatically identify the correct timestamp format for each individual event.

Question

Which approach is considered the industry best practice for handling this specific scenario? Is the datetime.xml method generally more robust and maintainable than splitting source types via transforms?

Thanks for your guidance!

Labels (2)
Tags (1)
0 Karma

spl_aficionado
Path Finder

Hi @PickleRick@bowesmana, thank you for your insightful replies !

Looking at the official props.conf documentation, and it seems clear that datetime.xml is the best practice for these situations - props.conf.  what do you think?

I also wonder if the official Tomcat TA covers it, but looking briefly into the props of the TA, and I don't see any date extraction, which is really strange.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

To be fully honest, I've never seen datetime.xml fiddled with. It's a relatively narrow border use case. I'm not sure there's a lot of docs on it either.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Neither will be intuitive (I disagree here a bit with @bowesmana ). By definition all events from a given sourcetype should share common format so if you have different time formats it would be natural for me to split it into separate sourcetypes but the split itself is a bit tricky - it won't work the way you're describing because timestamp recognition is happening at the very beginning of the ingestion pipeline and even if you recast your sourcetype it won't happen again. Even if you CLONE_SOURCETYPE your duplicated event will be reinjected into the queue after the timestamp recognition phase.

With syslog-provided events it's usually relatively easy because you can split your event stream into multiple sourcetypes before it hits Splunk. With files... it's gonna be tricky. Probably @bowesmana 's approach with dynamic overwrites of the already extracted (or assigned because extraction might not happen properly for misformatted timestamps) might be the way to go. But it's worth docummenting extensively because it's not intuitive.

BTW, catalina.out is a mess

bowesmana
SplunkTrust
SplunkTrust

😂yes totally right that neither are intuitive - it should be a Catalina thing. I know we had the same issue with a custom Tomcat app that had multiple date formats and we pushed back to get them fixed, which got traction.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes. It's one of the common issues with tomcat (and java in general) logs. But be prepared for more "fun".

1. Java apps often produce multiline stack dumps. And if by any chance you're forwarding those logs via syslog to a remote machine you'll end up with a single "logical event" split into several separate syslog entries. That's horrible to deal with.

2. Developers tend to happily write to logs... just about anything. And in any format (or lack thereof) they can think of. I don't know why but java seems to be one of the cases where the devs are most prolific in coming up with several different ways of formatting events from the same application.

3. It might or might not be an issue, but rotating logs with log4j is (or at least used to be) painful. It's usually not directly an ingestion issue but it might cause problems if you want to keep the log dir tidy - you can't just use logrotate and send HUP to the app.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

I'm not sure about best practice, but splitting the same stream into multiple sourcetypes just to handle different data format seems non-intuitive.

What about using INGEST_EVAL to extract _time with a bunch of eval statements to extract _time, e.g.

INGEST_EVAL _time = coalesce(strptime(_raw, "%FT%T"), strptime(_raw, "%d-%b-%Y %H:%M:%S"), strptime(_raw, ...))

 

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...