Splunk Search

Why am I not seeing any fields extracted with my REGEX in transforms.conf?

reedmohn
Communicator

At the risk of once again displaying my ignorance...
I added this transform regex to transforms.conf:

[myformat]
REGEX = ^.*\[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)$

I also tried this:

REGEX = \[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)

Props.conf has:

[mylog]
....
...
TRANFORMS-mylog_format = myformat

They're supposed to match log lines like this, but I'm not seeing any fields extracted:

2013-07-31 23:57:51,858 [26] INFO  MyApp.Service.Logger.Filter - Number not in range

The format is: timestamp [THREAD] LEVEL LOGGER - Message

The regex itself works with rex in search, but not here, and now I'm staring myself blind on something obvious, I'm sure....

Any advice?

0 Karma
1 Solution

esix_splunk
Splunk Employee
Splunk Employee

Your regex's are wrong. Remember to include your timestamp pattern as the event includes this. A greedy match with your regex doesnt work properly.

Try this

    \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)
OR

^.*\[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

props-

[mysource]
REPORT-mysource = mysource-extract

transforms

[mysource-extract]
REGEX = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

View solution in original post

esix_splunk
Splunk Employee
Splunk Employee

Your regex's are wrong. Remember to include your timestamp pattern as the event includes this. A greedy match with your regex doesnt work properly.

Try this

    \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)
OR

^.*\[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

props-

[mysource]
REPORT-mysource = mysource-extract

transforms

[mysource-extract]
REGEX = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3} \[(?<first>\d+)\] (?<second>\w+)\s+(?<third>[^\s]+)\s\-\s+(?<fourth>.*)

reedmohn
Communicator

EDIT: Got it working!

I tried both.. got nothing at first. But it seems we have winner 🙂 Thanks!

But your suggestion didn't work properly for most of the logs, since the third variable often contains whitespace. That's why I thought this didn't make a difference.
Once I corrected that, this worked:

[log4net_format]
REGEX = ^.*\[(?<thread>\d+)\] (?<level>\w+)\s+(?<logger>.+)\s-\s+(?<messages>.*)

Out of interest: Where is it you mean the greedy match won't work? There are a couple in the regexp.
Having said that, I don't fully understand why this expression works better than the one I had originally.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Need to use inline captures as mentioned:

In this example, I am not using the form setting, but instead doing an inline capture in the regex and defining the fields there.

[mysource]
REGEX  = ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[(?<capture1>\d+)\]\s+(?<sysloglevel>\w+)\s+(?<ApplicationName>[^\s]+)\s+\-\s+(?<message_body>.*)

reedmohn
Communicator

That's pretty much what I tried, too (see below). The capture labels got lost in the editor here...

0 Karma

reedmohn
Communicator

Ahh... sorry, the editor here screwed up my string before I edited in the code section. The field labels were edited out as HTML, I guess. These are the actual expressions I used:

REGEX = ^.*\[(?<thread>.*?)\]\s(?<level>[A-Z]+)\s+(?<logger>\S+\s\S+)\s\-\s(?<message>.+)$

REGEX = \[(?<thread>.*?)\]\s(?<level>[A-Z]+)\s+(?<logger>\S+\s\S+)\s\-\s(?<message>.+)
0 Karma

sk314
Builder

you have to have a capturing group within your regex. each capturing group would correspond to a field. You can specify the capturing groups in your transforms.conf like so:

[myformat]
REGEX = ^.*\[(?.*?)\]\s(?[A-Z]+)\s+(?\S+\s\S+)\s\-\s(?.+)$
FORMAT =  field_1::$1 field_2::$2 field_3::$3 field_4::$4

edit: I assumued your regex works for you. (didn't check)

0 Karma

reedmohn
Communicator

Thanks, but I see I got the formatting wrong in the OP. See answer below.

0 Karma

sk314
Builder

Could you try removing the name capture group and using the FORMAT line?

0 Karma

reedmohn
Communicator

Yup.. working on that right now.. 🙂

0 Karma

reedmohn
Communicator

And that's a no.. didn't happen.

Starting to think it's not picking up the transform config.

Though I have other transforms configured in the same files that work just fine, so I can't really see what the problem should be there. But I'll go through it all word for word, check I didn't miss a spelling error or something...

The regex itself works fine if I use it at search time, so that should not be the problem.

0 Karma

sk314
Builder

just checking, do you have the corresponding props.conf entry?

0 Karma

reedmohn
Communicator

Here's the full props entry:

[log4net]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
CHECK_FOR_HEADER = False
TRANSFORMS-log4net_events = log4net_format

And the current transform:

[log4net_format]
REGEX = ^.[(?\d+)] (?\w+)\s+(?.+)\s-\s+(?.)

0 Karma

reedmohn
Communicator

Ah.. that's :

[log4net_format]
REGEX = ^.*\[(?<thread>\d+)\] (?<level>\w+)\s+(?<logger>.+)\s-\s+(?<messages>.*)
0 Karma

reedmohn
Communicator

..and before you ask: no, I am not in Fast Mode 🙂

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...