Getting Data In

time stamping problem using TIME_PREFIX

cpenkert
Path Finder

I have events that get written to a log file with the timestamp being included in this format

<date>7/2/2010 1:13:33 PM</date>

I don't want to use a lookahead as the data is far into the event and I don't want to take that performance hit for each event. The attempts I have made haven't seemed to work. In the past I have been able to specify a simple regex like TIME_PREFIX = .+date= in the stanza in props.conf I'm not sure if this is having more problems because of the > character and/or because there is not a space before <date> in the event.

Does anyone have ideas on hos I can properly identify the TIME_PREFIX that will work?

We are running Splunk 4.1.0

Thanks

Tags (1)
0 Karma
1 Solution

Lowell
Super Champion

I'm not sure what you mean by performance hit because of the lookahead. No matter what your configuration, if you want splunk to extract do do timestamp recognition, then splunk has to search your event for a timestamp. (You can disable it, but then the current time will be applied to your events when they are picked up by splunk.)

Just so you know, your example TIME_PREFIX started with ".+", which is a greedy regular expression which almost guarantees you poorer performance since your instructing the regex engine to look as far as it can into the event, and only once it's exceeded the max lookahead amount (which is the MAX_TIMESTAMP_LOOKAHEAD setting) will it fall back to the first match (and probably the only match). You could use a lazy expression like ".+?", however, you don't really need either since TIME_PREFIX is a "search" mode regular expression (not "match" mode). The bottom line is that you really don't want ".+" or ".+? at the front of your regex.

This should work for you:

TIME_PREFIX = <date>
TIME_FORMAT = %d/%m/%Y %I:%H:%S %P
MAX_TIMESTAMP_LOOKAHEAD=1000

I would do some analysis to see how far into your event the timestamp is really occurring. If your having timestamp problems now, based on what you said, then I suspect that you may have to actually raise this to a higher value (which is why I raised from the default of 100 to 1000), because splunk should have been able to automatically find the timestamp you mentioned. Once splunk is finding your timestamp you can figure out how far into the event your timestamps are by using the following search. (Then use this value to lower your MAX_TIMESTAMP_LOOKAHEAD to a more reasonable value.)

sourcetype=your_xml_sourcetype | top timeendpos


BTW, if your "" occurs at the beginning of a line, then this would be even faster:

TIME_PREFIX = ^\s*<date>

Also, once you get your timestamp working, you may need to also need to tweak your event breaking logic (I'm guessing your dealing with multi-line events here)

Be sure to check out the docs:

View solution in original post

Lowell
Super Champion

I'm not sure what you mean by performance hit because of the lookahead. No matter what your configuration, if you want splunk to extract do do timestamp recognition, then splunk has to search your event for a timestamp. (You can disable it, but then the current time will be applied to your events when they are picked up by splunk.)

Just so you know, your example TIME_PREFIX started with ".+", which is a greedy regular expression which almost guarantees you poorer performance since your instructing the regex engine to look as far as it can into the event, and only once it's exceeded the max lookahead amount (which is the MAX_TIMESTAMP_LOOKAHEAD setting) will it fall back to the first match (and probably the only match). You could use a lazy expression like ".+?", however, you don't really need either since TIME_PREFIX is a "search" mode regular expression (not "match" mode). The bottom line is that you really don't want ".+" or ".+? at the front of your regex.

This should work for you:

TIME_PREFIX = <date>
TIME_FORMAT = %d/%m/%Y %I:%H:%S %P
MAX_TIMESTAMP_LOOKAHEAD=1000

I would do some analysis to see how far into your event the timestamp is really occurring. If your having timestamp problems now, based on what you said, then I suspect that you may have to actually raise this to a higher value (which is why I raised from the default of 100 to 1000), because splunk should have been able to automatically find the timestamp you mentioned. Once splunk is finding your timestamp you can figure out how far into the event your timestamps are by using the following search. (Then use this value to lower your MAX_TIMESTAMP_LOOKAHEAD to a more reasonable value.)

sourcetype=your_xml_sourcetype | top timeendpos


BTW, if your "" occurs at the beginning of a line, then this would be even faster:

TIME_PREFIX = ^\s*<date>

Also, once you get your timestamp working, you may need to also need to tweak your event breaking logic (I'm guessing your dealing with multi-line events here)

Be sure to check out the docs:

Lowell
Super Champion

Thanks, that's a great tip. I've run into that scenario before and I didn't know how that worked, specifically with whitespace.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

you don't really need the \s* at the end of TIME_PREFIX. Splunk will find the prefix, then start looking after that, regardless of what characters are after the match and before the detected time.

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

What is your exact configuration and what events does Splunk have a problem with?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...