Splunk Search

Regular expression works inline via 'rex', but not in EXTRACT.

bsayatovic
Path Finder

I have an enterprise application made of components that log to several different files. Some filenames are occasionally prefixed with a GUID to side-step multi-thread lock contention of the log files (a MS EntLib Logging feature). So, for example, my application might output these files:

  • MyApp.Facade.log
  • afe518e8-9394-4d29-9085-76272e6f8180MyApp.Facade.log
  • MyApp-service-trace-log.xml
  • Some.Component-log4net.log

I've been able to extract just the filename, excluding the optional GUID, using rex inline, e.g.:

index=prod rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?[^\\])$"

However, when I apply the same expression to an EXTRACT in props.conf...

#props.conf
[mysourcetype]
EXTRACT-SourceFilename = ^.*\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilename>[^\\]*)$ in source

...not all of my sources get a SourceFilename extracted. In particular, Some.Component-log4net.log does not have a SourceFilename. Yet the same expression via rex gives it a SourceFilenameTemp, e.g.

index=prod | rex field=source "^.\\\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\\\])$" | table source SourceFilename SourceFilenameTmp

source                             SourceFilename     SourceFilenameTmp
c:\logs\MyApp.Facade.log           MyApp.Facade.log   MyApp.Facade.log
C:\logs\Some.Component-log4net.log                    Some.Component-log4net.log

I can't tell what it is about that regular expression that allows it to work via rex but not via EXTRACT.

Can anyone point out my error, or suggest debugging tips?

Tags (3)
0 Karma
1 Solution

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

View solution in original post

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

bsayatovic
Path Finder

Doh! Forest... trees. That's exactly it.

Your answer then lead me to considering using a wildcard sourcetype, e.g. "(?::){0}", but I was to chicken to try this on *all sourcetypes, so for now, I've just added it to the sourcetypes I know I need it on.

Thanks!

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...