Splunk Search

How to write regex for field extraction to match two log entries?

gartnerj
Explorer

Folks,
I have the following REGEX:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),(?P<NAME1>[^,]+)?,?(?P<NAME2>[^,]+)?,?(?P<NAME3>[^,]+)?,?ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*),(?P<otherInfo>[^,]*)

I need to match 2 different log entries, one for systems that have upgraded (and have the NAME1, NAME2, NAME3, entries in the log files), and the older style which only have the ....MessageType,messageId,messageId2 stuff.

Here are example log entries I need to match:

OLD style (NO NAME1, NAME2, or NAME3 entries):

2014-09-10T06:03:22.270Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,ID:a1b9817d-a017-3924-a2e4-6e1ac30cd571,ID:c6ca4fa4-71ee-4453-be8e-66f41db75323,anonymous,,

NEW style 1 (addition of the NAME1, NAME2, NAME3 parts):

2014-09-10T15:02:02.060Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,NAME1,NAME2,NAME3,ID:3b84ef25-aa86-3020-951f-748bf47644f6,1161e6ca-9dc9-4205-a2bb-39fe8a220266,anonymous,,,

New Style 2 (blank NAME3)

2014-09-10T15:02:02.060Z,CLIENT,MESSAGE_SENT,SERVICE_REQUEST,NAME1,NAME2,,ID:3b84ef25-aa86-3020-951f-748bf47644f6,1161e6ca-9dc9-4205-a2bb-39fe8a220266,anonymous,,,

With the REGEX above, I do match The OLD Style correctly (nothing in the NAME1, NAME2, or NAME3 groups), and the NEW STYLE1 (NAME1/NAME2 are correct, but nothing in NAME3), but when I am testing in the regex101 site, for New Style 2, I am getting and error due to a "catastrophic backtracking" and execution time error.
I just can't seem to find the magic incantation to make sure it works for all three versions in the logs.
Any help is GREATLY appreciated!

1 Solution

gartnerj
Explorer

Well, I actually got it to work via regex:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),?(?P<ServiceName>[^,]+)?,?(?P<ServiceMethod>[^,]+)?,?(?P<ServiceInstance>[^,]+)?,ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*)?,(?P<otherInfo>[^,]*)

This gets all of parts for each type correctly. Thanks for the other suggestions -- I am going to look into those as it would be nice to have the fields extracted automatically so that I don't have to use this in each search/report.

View solution in original post

gartnerj
Explorer

Well, I actually got it to work via regex:

(?:[^:\n]*:){4}\d+\.\d+\w+,(?P<ComponentName>[^,]+),(?P<EventCode>[^,]+),(?P<MessageType>[^,]+),?(?P<ServiceName>[^,]+)?,?(?P<ServiceMethod>[^,]+)?,?(?P<ServiceInstance>[^,]+)?,ID:(?P<messageId>[^,]+),ID:(?P<CorrelationId>[^,]+),(?P<UserId>[^,]*)?,(?P<otherInfo>[^,]*)

This gets all of parts for each type correctly. Thanks for the other suggestions -- I am going to look into those as it would be nice to have the fields extracted automatically so that I don't have to use this in each search/report.

bmacias84
Champion

@gartnerj, my idea would be that your new style logs be called *yoursourcetype_v2 and apply a new transform with the correct delim. You would keep your old data with a separate transform with the appropriate delim applied. Hope that makes sense.

0 Karma

gartnerj
Explorer

With these logs, they are all in the same sourcetype. I'm still a bit unclear as to some of the comments above. Keeping foo/foov2 -- not sure what you are referring to there -- are you talking about two different field transformations? Any hints on how to construct the different field transforms? I still don't see how I get away from the regex required to pull the three different versions.

0 Karma

kristian_kolb
Ultra Champion

Actually agree with you there. Normally, different format of the logs = different sourcetype. But perhaps these sourcetypes are already in place.

0 Karma

bmacias84
Champion

That is true, but both new styles have the some field field count. If gartnerj keeps the original as foo and names the new format with a source type foov2 he shouldn't have any problems. and the fields can be made global.

0 Karma

kristian_kolb
Ultra Champion

There is s different count of fields so, a single REPORT with DELIMS/FIELDS will not necessarily do it for you. Haven't tried working with overlapping REPORTs, but that may work. What you could do is two EXTRACTs in props.conf

Btw, (?:[^:\n]*:){4} looks rather odd in the beginning. Perhaps you could write ^[^,]+, instead to jump over the timestamp.

EDIT: TYPO

gartnerj
Explorer

Don't I still need the regex to do this? I haven't used transforms before (I will look into them more) -- I was doing this in a given search. Also, does this change the fields for ALL users of SPLUNK, or just the APP that I am in?

0 Karma

bmacias84
Champion

Why are you use regex? its seem like you should be using a transform with delim = ,. Then specify the fields.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...