Splunk Search

How to write regex for field extraction to match two log entries?

Explorer

Folks,
I have the following REGEX:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),(?P<NAME1>[^,]+)?,?(?P<NAME2>[^,]+)?,?(?P<NAME3>[^,]+)?,?ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*),(?P<otherInfo>[^,]*)

I need to match 2 different log entries, one for systems that have upgraded (and have the NAME1, NAME2, NAME3, entries in the log files), and the older style which only have the ....MessageType,messageId,messageId2 stuff.

Here are example log entries I need to match:

OLD style (NO NAME1, NAME2, or NAME3 entries):

2014-09-10T06:03:22.270Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,ID:a1b9817d-a017-3924-a2e4-6e1ac30cd571,ID:c6ca4fa4-71ee-4453-be8e-66f41db75323,anonymous,,

NEW style 1 (addition of the NAME1, NAME2, NAME3 parts):

2014-09-10T15:02:02.060Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,NAME1,NAME2,NAME3,ID:3b84ef25-aa86-3020-951f-748bf47644f6,1161e6ca-9dc9-4205-a2bb-39fe8a220266,anonymous,,,

New Style 2 (blank NAME3)

2014-09-10T15:02:02.060Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,NAME1,NAME2,,ID:3b84ef25-aa86-3020-951f-748bf47644f6,1161e6ca-9dc9-4205-a2bb-39fe8a220266,anonymous,,,

With the REGEX above, I do match The OLD Style correctly (nothing in the NAME1, NAME2, or NAME3 groups), and the NEW STYLE1 (NAME1/NAME2 are correct, but nothing in NAME3), but when I am testing in the regex101 site, for New Style 2, I am getting and error due to a "catastrophic backtracking" and execution time error.
I just can't seem to find the magic incantation to make sure it works for all three versions in the logs.
Any help is GREATLY appreciated!

1 Solution

Explorer

Well, I actually got it to work via regex:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),?(?P<ServiceName>[^,]+)?,?(?P<ServiceMethod>[^,]+)?,?(?P<ServiceInstance>[^,]+)?,ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*)?,(?P<otherInfo>[^,]*)

This gets all of parts for each type correctly. Thanks for the other suggestions -- I am going to look into those as it would be nice to have the fields extracted automatically so that I don't have to use this in each search/report.

View solution in original post

Explorer

Well, I actually got it to work via regex:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),?(?P<ServiceName>[^,]+)?,?(?P<ServiceMethod>[^,]+)?,?(?P<ServiceInstance>[^,]+)?,ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*)?,(?P<otherInfo>[^,]*)

This gets all of parts for each type correctly. Thanks for the other suggestions -- I am going to look into those as it would be nice to have the fields extracted automatically so that I don't have to use this in each search/report.

View solution in original post

Champion

@gartnerj, my idea would be that your new style logs be called *yoursourcetype_v2 and apply a new transform with the correct delim. You would keep your old data with a separate transform with the appropriate delim applied. Hope that makes sense.

0 Karma

Explorer

With these logs, they are all in the same sourcetype. I'm still a bit unclear as to some of the comments above. Keeping foo/foov2 -- not sure what you are referring to there -- are you talking about two different field transformations? Any hints on how to construct the different field transforms? I still don't see how I get away from the regex required to pull the three different versions.

0 Karma

Ultra Champion

Actually agree with you there. Normally, different format of the logs = different sourcetype. But perhaps these sourcetypes are already in place.

0 Karma

Champion

That is true, but both new styles have the some field field count. If gartnerj keeps the original as foo and names the new format with a source type foov2 he shouldn't have any problems. and the fields can be made global.

0 Karma

Ultra Champion

There is s different count of fields so, a single REPORT with DELIMS/FIELDS will not necessarily do it for you. Haven't tried working with overlapping REPORTs, but that may work. What you could do is two EXTRACTs in props.conf

Btw, (?:[^:\n]*:){4} looks rather odd in the beginning. Perhaps you could write ^[^,]+, instead to jump over the timestamp.

EDIT: TYPO

Explorer

Don't I still need the regex to do this? I haven't used transforms before (I will look into them more) -- I was doing this in a given search. Also, does this change the fields for ALL users of SPLUNK, or just the APP that I am in?

0 Karma

Champion

Why are you use regex? its seem like you should be using a transform with delim = ,. Then specify the fields.

0 Karma