Splunk Search

Regular expression works inline via 'rex', but not in EXTRACT.

bsayatovic
Path Finder

I have an enterprise application made of components that log to several different files. Some filenames are occasionally prefixed with a GUID to side-step multi-thread lock contention of the log files (a MS EntLib Logging feature). So, for example, my application might output these files:

  • MyApp.Facade.log
  • afe518e8-9394-4d29-9085-76272e6f8180MyApp.Facade.log
  • MyApp-service-trace-log.xml
  • Some.Component-log4net.log

I've been able to extract just the filename, excluding the optional GUID, using rex inline, e.g.:

index=prod rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?[^\\])$"

However, when I apply the same expression to an EXTRACT in props.conf...

#props.conf
[mysourcetype]
EXTRACT-SourceFilename = ^.*\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilename>[^\\]*)$ in source

...not all of my sources get a SourceFilename extracted. In particular, Some.Component-log4net.log does not have a SourceFilename. Yet the same expression via rex gives it a SourceFilenameTemp, e.g.

index=prod | rex field=source "^.\\\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\\\])$" | table source SourceFilename SourceFilenameTmp

source                             SourceFilename     SourceFilenameTmp
c:\logs\MyApp.Facade.log           MyApp.Facade.log   MyApp.Facade.log
C:\logs\Some.Component-log4net.log                    Some.Component-log4net.log

I can't tell what it is about that regular expression that allows it to work via rex but not via EXTRACT.

Can anyone point out my error, or suggest debugging tips?

Tags (3)
0 Karma
1 Solution

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

View solution in original post

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

bsayatovic
Path Finder

Doh! Forest... trees. That's exactly it.

Your answer then lead me to considering using a wildcard sourcetype, e.g. "(?::){0}", but I was to chicken to try this on *all sourcetypes, so for now, I've just added it to the sourcetypes I know I need it on.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...