Splunk Search

How to write regex to identify and use time field B for event timestamp if time field A is missing?

woodcock
Esteemed Legend

I have a set of data where most events have an "end time" but some do not. I would like to setup Splunk to look for "end time" but, if not found, use "start time" instead. The only way I can think to do this is with datetime.xml but I cannot get it to work. Let us say that the primary time ("end") is the 11th field in the PSV and the secondary time ("start") is the 10th field in the psv. I have tried both the following but neither works. I assume the problem is that "CDATA" does not mark the beginning of the entire event's _raw string but rather an already-parsed, naturally-delimited sub-string, but I am not sure. How can I make this work? Also, if I do get it working, is there anyway to force Splunk to use MY precedence and not allow it to "learn" that there is always a "start" field such that it learns to prefer that (which is the opposite of what I am trying to do)?

<datetime>
<define name="DateTime_serviceDeliveryStartTimeStamp" extract="year, month, day, hour, minute, second">
   <text><![CDATA[(?:[^\|]*\|){10}(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})|]]></text>
   <!-- text><![CDATA[[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})|]]></text -->
</define>
<define name="DateTime_serviceDeliveryEndTimeStamp" extract="year, month, day, hour, minute, second">
   <text><![CDATA[(?:[^\|]*\|){11}(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})|]]></text>
   <!-- text><![CDATA[[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})|]]></text -->
</define>
<timePatterns>
   <use name="DateTime_serviceDeliveryEndTimeStamp"/>
   <use name="DateTime_serviceDeliveryStartTimeStamp"/>
</timePatterns>
<datePatterns>
   <use name="DateTime_serviceDeliveryEndTimeStamp"/>
   <use name="DateTime_serviceDeliveryStartTimeStamp"/>
</datePatterns>
</datetime>
1 Solution

woodcock
Esteemed Legend

OK, I finally got something working after escalating to Splunk support. The main thing that thwarted my solving it on my own was that it turns that Splunk's Data Preview tools is somewhat prone to false negatives, so much so that many Splunker's do not trust it enough to even use it. BEWARE!

I do not fully understand the nuance of why their solution works and mine did not but here is what I was told, verbatim:

The key here is that this is TIME_PREFIX. Splunk only starts looking for timestamps after the matched string. Your regex will always match the 11th field, so Splunk will always start looking at the 12th field. Mine only matches the 11th field if the 12th field exists, so if the 12th field is empty Splunk will start looking from the 11th field.

In any case, here are the props.conf stanza details that do work:

NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_PREFIX = ^(?:[^\|]*\|){10}(([^\|]*\|)(?!\|))?
MAX_TIMESTAMP_LOOKAHEAD = 12
TIME_FORMAT = %y%m%d%H%M%S

View solution in original post

woodcock
Esteemed Legend

OK, I finally got something working after escalating to Splunk support. The main thing that thwarted my solving it on my own was that it turns that Splunk's Data Preview tools is somewhat prone to false negatives, so much so that many Splunker's do not trust it enough to even use it. BEWARE!

I do not fully understand the nuance of why their solution works and mine did not but here is what I was told, verbatim:

The key here is that this is TIME_PREFIX. Splunk only starts looking for timestamps after the matched string. Your regex will always match the 11th field, so Splunk will always start looking at the 12th field. Mine only matches the 11th field if the 12th field exists, so if the 12th field is empty Splunk will start looking from the 11th field.

In any case, here are the props.conf stanza details that do work:

NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_PREFIX = ^(?:[^\|]*\|){10}(([^\|]*\|)(?!\|))?
MAX_TIMESTAMP_LOOKAHEAD = 12
TIME_FORMAT = %y%m%d%H%M%S

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Ah, I see now what you're going on about 🙂

That should be fixable by merging the two into one rule and use regex' greediness to our advantage. Rough-draft regex like this:

(?:[^\|]*\|){10}(?:[^\|]*\|)?(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\|

That should greedily skip the eleventh field, attempt to extract the value from the twelfth, and fall back to the eleventh field if the twelfth field doesn't match.

I presume your date is basically twelve digits with no separating characters and a short two-digit year?

0 Karma

woodcock
Esteemed Legend

You, sir, are a GENIUS!!!
This solution obviates the necessity for datetime.xml at all, except it doesn't work. I have tried this:

#12th field is the preferred DateTime ("serviceDeliveryEndTimestamp")
#11th field is the backup DateTime ("serviceDeliveryStartTimestamp")
TIME_PREFIX = ^([^\|]*\|){10}([^\|]*\|)?
TIME_FORMAT = %y%m%d%H%M%S

And this:

<datetime>
<define name="DateTime_12thThen11th_2DigitsOnlyForSeconds" extract="year, month, day, hour, minute, second">
   <text><![CDATA[[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|(?:[^\|]*\|)?(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})]]></text>
</define>

<timePatterns>
   <use name="DateTime_12thThen11th_2DigitsOnlyForSeconds"/>
</timePatterns>
<datePatterns>
   <use name="DateTime_12thThen11th_2DigitsOnlyForSeconds"/>
</datePatterns>
</datetime>

And this:

<datetime>
<define name="DateTime_12thThen11th_2DigitsOnlyForSeconds" extract="year, month, day, hour, minute, second">
   < text><![CDATA[(?:[^\|]*\|){10}(?:[^\|]*\|)?(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})]]></text>
</define>

<timePatterns>
   <use name="DateTime_12thThen11th_2DigitsOnlyForSeconds"/>
</timePatterns>
<datePatterns>
   <use name="DateTime_12thThen11th_2DigitsOnlyForSeconds"/>
</datePatterns>
</datetime>
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Any learning can only happen within the datetime.xml file referenced by that sourcetype in props.conf - any other datetime.xml files, there could be dozens in your environment, are ignored.

0 Karma

woodcock
Esteemed Legend

Yes, and that kind of learning IS a problem. Within my datetime.xml, I need Splunk to prefer the "worst" rule ("end" which is NOT always present) over the "best" rule ("start" which IS always present).

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

As a first step, create a new datetime_foo.xml and reference that in the props.conf stanza for that sourcetype. Remove everything from that file not needed by your custom rules. This may take care of your rules not being used, and will take care of your question at the bottom about making sure Splunk uses your custom rules rather than trying to be all smartypants about it and learn what might turn out to be wrong in your environment.

0 Karma

woodcock
Esteemed Legend

As you can see, this has been done (strip out all other rules but mine) but if I understand datetime.xml, Splunk "learns" which rules "work best" for each sourcetype and prefers those. I need to disable this capability and force Splunk to go top-down through my rules. But first I need to make rules that actually work!

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You mentioned some stuff you already tried, but I can't see that.

Do post some sample data with and without the primary timestamp.

0 Karma

woodcock
Esteemed Legend

Splunk was not honoring my markup but 4-spaces worked. I have updated the original question.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.