Getting Data In

Pad thousandths of a second in timestamp with leading zeros

Path Finder

I have two sourcetypes where the thousandth of a second portion of the timestamp is not padded w/ leading zeros if the time is less than 100 thousandths of a second.

Examples:

A log event is created at eight AM and eight thousandths of a second. The timestamp for this event would be... 08:00:00:8 Splunk interprets that time as... 08:00:00:800 instead of 08:00:00:008

A log event is created at nine AM and ninety thousandths of a second. The timestamp for this event would be... 09:00:00:90 Splunk interprets that time as... 09:00:00:900 instead of 09:00:00:090

This causes log entries in Splunk to be out of order when viewing a sequence of logs.

My question is how I should go about fixing the timestamp for future logs (w/o a huge burden on the indexer). And can I fix the timestamps for events that already exist in my index?

I will include some example logs for additional clarification. Note that all items (including date and time) are tab delimited.

SERVER-1    Aug 05, 2010    12:58:58:851    1234    <SYSTEM/AUTOMATION> Running Job ID 1, Name of job 1  
SERVER-1    Aug 05, 2010    12:58:59:2  1235    <SYSTEM/AUTOMATION> Running Job ID 5, Name of job 2  
SERVER-1    Aug 05, 2010    12:58:59:70 1235    <SYSTEM/AUTOMATION> Running Job ID 3, Name of job 3  
SERVER-1    Aug 05, 2010    12:58:59:132    1235    <SYSTEM/AUTOMATION> Running Job ID 2, Name of job 4

Per Lowell's suggestions, I have tried the following permutations for the time format of my sourcetype in the local props.conf file with no improvement...

TIME_FORMAT = %b %d, %Y"\t"%H:%M:%S:%3N  
TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%3N  
TIME_FORMAT = %b %d, %Y %H:%M:%S:%3N  
TIME_FORMAT = %b %d, %Y %H:%M:%S:%3N  
TIME_FORMAT = %b %d, %Y"\t"%H:%M:%S:%Q  
TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%Q  
TIME_FORMAT = %b %d, %Y %H:%M:%S:%Q  
TIME_FORMAT = %b %d, %Y %H:%M:%S:%Q

It's hard to see, but the third and seventh lines have a literal tab between the year and hour, and the fourth and eighth line have a space between the year and hour.

Tags (1)
1 Solution

Super Champion

You can't fix the existing entries without deleting and re-indexing your events. (And this is true for any change to your event data; once splunk indexes and event, it's unchangeable)

Moving forward, you should be able get splunk to accept the correct timestamp by using an explicit TIME_FORMAT entry in your props.conf file. I believe that using the "%nN" format entry will force splunk to assume a certain number of digits, in this case 3. So you would want something like:

[your_sourcetype]
TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%3N

I'm not 100% sure about the tab thing, you could try putting in a literal tab, or a single space, like so:

TIME_FORMAT = %b %d, %Y %H:%M:%S:%3N

Note: Using an explicit TIME_FORMAT setting should make indexing faster that making splunk guess as to where you timestamp is and what format its in. I use an explicit entry like this for nearly all my sourcetype. (It also helps out if you have random timestamps scattered throughout your logs that you don't want splunk to use.)


Documentation:

View solution in original post

Super Champion

This is not an "answer" but more of a list of things I've tried on my system with no workable solutions, so I'm calling on the help of those more knowledgeable gurus, like Gerald.

I confirmed that both the following timestamp format strings to do not work, just as Jeffa has reported:

TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%3N
TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%Q

Knowing that the sub-seconds is actually stored in the indexed _subsecond field, I decided to try a more exotic workaround by populating the field explicitly:

So I used a TIME_FORMAT with no sub-seconds, then used a transformer to modify the _raw event to add in one or two leading 0s (so it is properly padded to 3 digits), and then used a final transformer to extract out the value of _subsecond, however doing this seems to result in splunk completely dropping the event all together. I think this may be because splunk is still extracting a subseconds value even when no %3N or %Q is actually used in the TIME_FORMAT string. So therefore, I suspect that my attempt to set _subsecond manually is infact leading to two _subsecond values which I'm guessing is what splunk doesn't like and is therefore dropping it (This is just a theory)

Here is the config I tried:

props.conf:

[source::.../timestamp_test.log]
sourcetype = timestamp_test

[timestamp_test]
TIME_FORMAT = %b %d, %Y\t%H:%M:%S
SHOULD_LINEMERGE = False
TRANSFORMS-subseconds = fixup_subsecond, fixup_subsecond, extract_subsecond

transforms.conf:

[fixup_subsecond]
REGEX = ^(\S+\s+\w{3} \d+,\s+\d{4}\s+\d\d:\d\d:\d\d):(\d\d?\s.*)$
FORMAT = $1:0$2
DEST_KEY = _raw

[extract_subsecond]
REGEX = ^\S+\s+\w{3} \d+,\s+\d{4}\s+\d\d:\d\d:\d\d:(\d\d\d)\s
FORMAT = _subsecond::$
WRITE_META = True

This config actually prevents the events from being indexed at all. But if you change, the format line to FORMAT = the_sub_second::$1 (for example) then the event does get indexed but obviously it still doesn't have the correct sub-second timestamp.

On a further test, I determined that if you modify the example data and replace the ":" in the timestamp between seconds and subseconds with a space (" "), and then adjust the all the regexes and formats appropriately, then the above extract-your-own-subseconds-hack acutally does work. Of course, this still doesn't work, because if reformatting the input were an option, the obvious thing to do would be to pad the sub-seconds properly. So that's it, I'm out of ideas.

Super Champion

Just add 4 spaces at the start of the line.

0 Karma

Path Finder

That looks better...thanks. Might I ask how you added the literal tabs? I couldn't find a reference to it, and when I hit the Tab key, I go to the next input box.

0 Karma

Super Champion

Jeffa, I've attempted to reformat your examples so you can see the tabs and all. (If you don't like how it looks, you can revert my changes.)

0 Karma

Super Champion

You can't fix the existing entries without deleting and re-indexing your events. (And this is true for any change to your event data; once splunk indexes and event, it's unchangeable)

Moving forward, you should be able get splunk to accept the correct timestamp by using an explicit TIME_FORMAT entry in your props.conf file. I believe that using the "%nN" format entry will force splunk to assume a certain number of digits, in this case 3. So you would want something like:

[your_sourcetype]
TIME_FORMAT = %b %d, %Y\t%H:%M:%S:%3N

I'm not 100% sure about the tab thing, you could try putting in a literal tab, or a single space, like so:

TIME_FORMAT = %b %d, %Y %H:%M:%S:%3N

Note: Using an explicit TIME_FORMAT setting should make indexing faster that making splunk guess as to where you timestamp is and what format its in. I use an explicit entry like this for nearly all my sourcetype. (It also helps out if you have random timestamps scattered throughout your logs that you don't want splunk to use.)


Documentation:

View solution in original post

Super Champion

Sorry, I was putting quotes around \t for readability, I wasn't suggesting that you actually need to use "\t". I'll try to run a few of your test events on a test system and see if I can come up with anything.

0 Karma

Path Finder

You are correct Lowell...correcting the timestamp after Splunk "recognizes" it will not be helpful

0 Karma

Path Finder

You mention to 'Try using "\t" as your separator'...I have already tried \t, are you recommending that I use the quotes?

0 Karma

Super Champion

Often, you can tweak incoming data using a transformer or a SEDCMD this is pretty useful in general. So you could use either of those features to pad your sub-seconds with leading 0s, but that will not actually solve your issue. This is because Splunk does timestamp recognition before any raw event transformations, so you would end up seeing 3 digits in your events as you want, but the actual timestamps associated with the event (shown on the left) would still be incorrect, just as they are now. ;-(

0 Karma

Super Champion

Try using "\t" as your separator, and try using %Q instead of %3N (they should be the same, but I'm not sure how they are implemented internally.) The docs are here: http://www.splunk.com/base/Documentation/latest/Admin/Configuretimestamprecognition#Enhanced_strptim...

0 Karma

Path Finder

I set up a lab environment and I've tried all three options (\t, literal tab, and space) with no success. I started by importing a log, then changing the props.conf, then importing a second log. I then tried modifying the props.conf prior to any logs being imported and Splunk always appended the zeroes to the end of the 1/1000 second as opposed to padding at the beginning. I stopped splunk and did a "splunk clean eventdata" in between each test. Am I missing something? Any other suggestions?

0 Karma

Super Champion

You are correct about the dot vs colon, my mistake. The date pattern does match your example. As far as the tab goes, I think that \t will work for you. I've updated my answer based on your comments and suggested a few alternate ideas if \t doesn't work for you.

0 Karma

Path Finder

A couple of things...

First, you have listed [dot]%3N however I believe you mean [colon]%3N to match my time format. Is that right?

Second, my date stamps are (for example) Aug 03, 2010 So my date portion should be %b %d, %Y correct?

Third, the date and time stamps are separated by a tab (the log is tab delimited)...is there a proper way to denote that in the props.conf file?

0 Karma