Splunk Search

Regular expression works inline via 'rex', but not in EXTRACT.

bsayatovic
Path Finder

I have an enterprise application made of components that log to several different files. Some filenames are occasionally prefixed with a GUID to side-step multi-thread lock contention of the log files (a MS EntLib Logging feature). So, for example, my application might output these files:

  • MyApp.Facade.log
  • afe518e8-9394-4d29-9085-76272e6f8180MyApp.Facade.log
  • MyApp-service-trace-log.xml
  • Some.Component-log4net.log

I've been able to extract just the filename, excluding the optional GUID, using rex inline, e.g.:

index=prod rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?[^\\])$"

However, when I apply the same expression to an EXTRACT in props.conf...

#props.conf
[mysourcetype]
EXTRACT-SourceFilename = ^.*\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilename>[^\\]*)$ in source

...not all of my sources get a SourceFilename extracted. In particular, Some.Component-log4net.log does not have a SourceFilename. Yet the same expression via rex gives it a SourceFilenameTemp, e.g.

index=prod | rex field=source "^.\\\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\\\])$" | table source SourceFilename SourceFilenameTmp

source                             SourceFilename     SourceFilenameTmp
c:\logs\MyApp.Facade.log           MyApp.Facade.log   MyApp.Facade.log
C:\logs\Some.Component-log4net.log                    Some.Component-log4net.log

I can't tell what it is about that regular expression that allows it to work via rex but not via EXTRACT.

Can anyone point out my error, or suggest debugging tips?

Tags (3)
0 Karma
1 Solution

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

View solution in original post

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

bsayatovic
Path Finder

Doh! Forest... trees. That's exactly it.

Your answer then lead me to considering using a wildcard sourcetype, e.g. "(?::){0}", but I was to chicken to try this on *all sourcetypes, so for now, I've just added it to the sourcetypes I know I need it on.

Thanks!

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...