Splunk Search

Regular expression works inline via 'rex', but not in EXTRACT.

bsayatovic
Path Finder

I have an enterprise application made of components that log to several different files. Some filenames are occasionally prefixed with a GUID to side-step multi-thread lock contention of the log files (a MS EntLib Logging feature). So, for example, my application might output these files:

  • MyApp.Facade.log
  • afe518e8-9394-4d29-9085-76272e6f8180MyApp.Facade.log
  • MyApp-service-trace-log.xml
  • Some.Component-log4net.log

I've been able to extract just the filename, excluding the optional GUID, using rex inline, e.g.:

index=prod rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?[^\\])$"

However, when I apply the same expression to an EXTRACT in props.conf...

#props.conf
[mysourcetype]
EXTRACT-SourceFilename = ^.*\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilename>[^\\]*)$ in source

...not all of my sources get a SourceFilename extracted. In particular, Some.Component-log4net.log does not have a SourceFilename. Yet the same expression via rex gives it a SourceFilenameTemp, e.g.

index=prod | rex field=source "^.\\\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\\\])$" | table source SourceFilename SourceFilenameTmp

source                             SourceFilename     SourceFilenameTmp
c:\logs\MyApp.Facade.log           MyApp.Facade.log   MyApp.Facade.log
C:\logs\Some.Component-log4net.log                    Some.Component-log4net.log

I can't tell what it is about that regular expression that allows it to work via rex but not via EXTRACT.

Can anyone point out my error, or suggest debugging tips?

Tags (3)
0 Karma
1 Solution

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

View solution in original post

jeff
Contributor

I don't see anything inherently wrong with the regex... so I'll need to ask the obvious question - are you sure the Some.Component-log4net.log is being correctly typed as "mysourcetype"? Check by adding sourcetype to your table above:

index=prod | rex field=source "^.\\([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})?(?<SourceFilenameTmp>[^\\])$" | table sourcetype source SourceFilename SourceFilenameTmp

bsayatovic
Path Finder

Doh! Forest... trees. That's exactly it.

Your answer then lead me to considering using a wildcard sourcetype, e.g. "(?::){0}", but I was to chicken to try this on *all sourcetypes, so for now, I've just added it to the sourcetypes I know I need it on.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...

Fun with Regular Expression - multiples of nine

Fun with Regular Expression - multiples of nineThis challenge was first posted on Slack #regex channel ...