Splunk Search

Why am I not seeing any fields extracted with my REGEX in transforms.conf?

reedmohn
Communicator

At the risk of once again displaying my ignorance...
I added this transform regex to transforms.conf:

[myformat]
REGEX = ^.*\[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)$

I also tried this:

REGEX = \[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)

Props.conf has:

[mylog]
....
...
TRANFORMS-mylog_format = myformat

They're supposed to match log lines like this, but I'm not seeing any fields extracted:

2013-07-31 23:57:51,858 [26] INFO  MyApp.Service.Logger.Filter - Number not in range

The format is: timestamp [THREAD] LEVEL LOGGER - Message

The regex itself works with rex in search, but not here, and now I'm staring myself blind on something obvious, I'm sure....

Any advice?

0 Karma
1 Solution

esix_splunk
Splunk Employee
Splunk Employee

Your regex's are wrong. Remember to include your timestamp pattern as the event includes this. A greedy match with your regex doesnt work properly.

Try this

    \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)
OR

^.*\[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

props-

[mysource]
REPORT-mysource = mysource-extract

transforms

[mysource-extract]
REGEX = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

View solution in original post

esix_splunk
Splunk Employee
Splunk Employee

Your regex's are wrong. Remember to include your timestamp pattern as the event includes this. A greedy match with your regex doesnt work properly.

Try this

    \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)
OR

^.*\[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

props-

[mysource]
REPORT-mysource = mysource-extract

transforms

[mysource-extract]
REGEX = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

reedmohn
Communicator

EDIT: Got it working!

I tried both.. got nothing at first. But it seems we have winner 🙂 Thanks!

But your suggestion didn't work properly for most of the logs, since the third variable often contains whitespace. That's why I thought this didn't make a difference.
Once I corrected that, this worked:

[log4net_format]
REGEX = ^.*\[(?<thread>\d+)\] (?<level>\w+)\s+(?<logger>.+)\s-\s+(?<messages>.*)

Out of interest: Where is it you mean the greedy match won't work? There are a couple in the regexp.
Having said that, I don't fully understand why this expression works better than the one I had originally.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Need to use inline captures as mentioned:

In this example, I am not using the form setting, but instead doing an inline capture in the regex and defining the fields there.

[mysource]
REGEX  = ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[(?<capture1>\d+)\]\s+(?<sysloglevel>\w+)\s+(?<ApplicationName>[^\s]+)\s+\-\s+(?<message_body>.*)

reedmohn
Communicator

That's pretty much what I tried, too (see below). The capture labels got lost in the editor here...

0 Karma

reedmohn
Communicator

Ahh... sorry, the editor here screwed up my string before I edited in the code section. The field labels were edited out as HTML, I guess. These are the actual expressions I used:

REGEX = ^.*\[(?<thread>.*?)\]\s(?<level>[A-Z]+)\s+(?<logger>\S+\s\S+)\s\-\s(?<message>.+)$

REGEX = \[(?<thread>.*?)\]\s(?<level>[A-Z]+)\s+(?<logger>\S+\s\S+)\s\-\s(?<message>.+)
0 Karma

sk314
Builder

you have to have a capturing group within your regex. each capturing group would correspond to a field. You can specify the capturing groups in your transforms.conf like so:

[myformat]
REGEX = ^.*\[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)$
FORMAT =  field_1::$1 field_2::$2 field_3::$3 field_4::$4

edit: I assumued your regex works for you. (didn't check)

0 Karma

reedmohn
Communicator

Thanks, but I see I got the formatting wrong in the OP. See answer below.

0 Karma

sk314
Builder

Could you try removing the name capture group and using the FORMAT line?

0 Karma

reedmohn
Communicator

Yup.. working on that right now.. 🙂

0 Karma

reedmohn
Communicator

And that's a no.. didn't happen.

Starting to think it's not picking up the transform config.

Though I have other transforms configured in the same files that work just fine, so I can't really see what the problem should be there. But I'll go through it all word for word, check I didn't miss a spelling error or something...

The regex itself works fine if I use it at search time, so that should not be the problem.

0 Karma

sk314
Builder

just checking, do you have the corresponding props.conf entry?

0 Karma

reedmohn
Communicator

Here's the full props entry:

[log4net]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
CHECK_FOR_HEADER = False
TRANSFORMS-log4net_events = log4net_format

And the current transform:

[log4net_format]
REGEX = ^.[(?\d+)] (?\w+)\s+(?.+)\s-\s+(?.)

0 Karma

reedmohn
Communicator

Ah.. that's :

[log4net_format]
REGEX = ^.*\[(?<thread>\d+)\] (?<level>\w+)\s+(?<logger>.+)\s-\s+(?<messages>.*)
0 Karma

reedmohn
Communicator

..and before you ask: no, I am not in Fast Mode 🙂

0 Karma
Get Updates on the Splunk Community!

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...

Introducing New Splunkbase Governance!

Splunk apps are essential for maximizing the value of your Splunk Experience. Whether you’re using the default ...

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...