Splunk Search

Regular expression works inline via 'rex', but not in EXTRACT.

bsayatovic
Path Finder

I have an enterprise application made of components that log to several different files. Some filenames are occasionally prefixed with a GUID to side-step multi-thread lock contention of the log files (a MS EntLib Logging feature). So, for example, my application might output these files:

  • MyApp.Facade.log
  • afe518e8-9394-4d29-9085-76272e6f8180MyApp.Facade.log
  • MyApp-service-trace-log.xml
  • Some.Component-log4net.log

I've been able to extract just the filename, excluding the optional GUID, using rex inline, e.g.:

index=prod rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?[^\\])$"

However, when I apply the same expression to an EXTRACT in props.conf...

#props.conf
[mysourcetype]
EXTRACT-SourceFilename = ^.*\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilename>[^\\]*)$ in source

...not all of my sources get a SourceFilename extracted. In particular, Some.Component-log4net.log does not have a SourceFilename. Yet the same expression via rex gives it a SourceFilenameTemp, e.g.

index=prod | rex field=source "^.\\\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\\\])$" | table source SourceFilename SourceFilenameTmp

source                             SourceFilename     SourceFilenameTmp
c:\logs\MyApp.Facade.log           MyApp.Facade.log   MyApp.Facade.log
C:\logs\Some.Component-log4net.log                    Some.Component-log4net.log

I can't tell what it is about that regular expression that allows it to work via rex but not via EXTRACT.

Can anyone point out my error, or suggest debugging tips?

Tags (3)
0 Karma
1 Solution

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

View solution in original post

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

bsayatovic
Path Finder

Doh! Forest... trees. That's exactly it.

Your answer then lead me to considering using a wildcard sourcetype, e.g. "(?::){0}", but I was to chicken to try this on *all sourcetypes, so for now, I've just added it to the sourcetypes I know I need it on.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...