Splunk Search

Using regex in source stanzas (props.conf)

emiller42
Motivator

Hello!

I have some log files with dynamic naming that I'm having trouble matching with props.conf stanzas. Here are examples of the filenames:

c:\Foo\pa.log
c:\Foo\PaApps\log\11-2012.xml 
    (changes for each month)
c:\Foo\PaApps\log\11-2012p.xml 
    (changes for each month)
c:\Foo\PaApps\log\1e85358e-11a3-44c2-8f62-ffe88c035478.log  
    (This is an error log.  New file generated with each error, and has a unique GUID filename)  
c:\Foo\PaApps\log\ProofServerManager_11-2012.log 
    (changes for each month)
c:\Foo\PaApps\log\ProofServerMonitor_11-2012.log 
    (changes for each month)
c:\Foo\PaApps\stp1pfsprf001_11-2012.xml  
    (That host_month-year.  All files are collected on one host, but come from several.  Hence the naming)

Now, given the variability of the filenames, I don't think I have any hope of specifically targetting them in my monitor stanzas. So I'm not assigning sourcetype on the forwarders. Monitoring Stanza names are as follows:

[monitor://c:\Foo\PaApps\log\*.log]
[monitor://c:\Foo\PaApps\log\*.xml]
[monitor://c:\Foo\pa.log]

Now so far, so good. All appropriate logs are being monitored and sent to the indexer as expected. So the next step is to use [source::] stanzas in the props.conf file on the indexer to sort them out, apply sourcetypes, etc. This is where things go wrong.

Here are my source:: stanzas, and what they should match:

[source::...[/\\]\d+-\d+p.xml]
    (Should match c:\Foo\PaApps\log\11-2012p.xml)

[source::...[/\\]\d+-\d+.xml]
    (Should match c:\Foo\PaApps\log\11-2012.xml)

[source::...[/\\]ProofServerManager_\d+-\d+.log]
    (Should match c:\Foo\PaApps\log\ProofServerManager_11-2012.log)

[source::...[/\\]ProofServerMonitor_\d+-\d+.log]
    (Should match c:\Foo\PaApps\log\ProofServerMonitor_11-2012.log)

[source::...[/\\]stp1pfsprf\d+_\d+-\d+.xml]
    (Should match c:\Foo\PaApps\stp1pfsprf001_11-2012.xml)

[source::...[/\\]\w+-\w+-\w+-\w+-\w+.log]
    (Should match c:\Foo\PaApps\log\1e85358e-11a3-44c2-8f62-ffe88c035478.log)

[source::...[/\\]pa.log]
    (Should match c:\Foo\pa.log)

See this answer for details on where I got the syntax for these stanzas.

Now, when I first set this up, I did so on a local instance of Splunk running on my desktop. I recreated the log file path, and populated it with sample files. (So it was functionally identical to the servers I would be monitoring) Then using the above inputs.conf and props.conf, I was able to properly index the files, and all props.conf stanzas applied appropriately.

However, when I push the configuration out to my indexer and forwarders, the props.conf stanzas are not being applied, and the indexer is essentially 'learning' the sources. Resulting in sourcetypes of 'xml-1', 'xml-2', etc.

Any ideas why this worked locally, but not on the server?

Thank you!

Tags (2)
1 Solution

emiller42
Motivator

After working with Splunk support and trying a whole lot of things, I have found a solution: whitelists in inputs. This allowed me to have monitor stanzas that technically overlapped, but the whitelisting would then narrow the targetting as needed. This meant I didn't have to muck with [source::...] stanzas at all.

Example:

[monitor://c:\foo\paapps\log\stp1pfsprf*.xml]
sourcetype = sourcetype_a
whitelist = .+?[/\\]stp1pfsprf\d+_\d+-\d+.xml

[monitor://c:\foo\paapps\log\*p.xml]
sourcetype = sourcetype_b
whitelist = .+?[/\\]\d+-\d+p.xml

[monitor://c:\foo\paapps\log\*.xml]
sourcetype = sourcetype_c
whitelist = .+?[/\\]\d+-\d+.xml

So even though the third monitor technically would capture the same files as the previous two, the whitelists clears up the collisions so that each monitor stanza only applies to the files it's intended to.

It also appears that the information in this answer is not correct. [source::...] stanzas can not use full regex, and are limited to the following:

When setting a [spec] stanza, you can use the following regex-type syntax:
... recurses through directories until the match is met.
*   matches anything but / 0 or more times.
|   is equivalent to 'or'
( ) are used to limit scope of |.

View solution in original post

emiller42
Motivator

After working with Splunk support and trying a whole lot of things, I have found a solution: whitelists in inputs. This allowed me to have monitor stanzas that technically overlapped, but the whitelisting would then narrow the targetting as needed. This meant I didn't have to muck with [source::...] stanzas at all.

Example:

[monitor://c:\foo\paapps\log\stp1pfsprf*.xml]
sourcetype = sourcetype_a
whitelist = .+?[/\\]stp1pfsprf\d+_\d+-\d+.xml

[monitor://c:\foo\paapps\log\*p.xml]
sourcetype = sourcetype_b
whitelist = .+?[/\\]\d+-\d+p.xml

[monitor://c:\foo\paapps\log\*.xml]
sourcetype = sourcetype_c
whitelist = .+?[/\\]\d+-\d+.xml

So even though the third monitor technically would capture the same files as the previous two, the whitelists clears up the collisions so that each monitor stanza only applies to the files it's intended to.

It also appears that the information in this answer is not correct. [source::...] stanzas can not use full regex, and are limited to the following:

When setting a [spec] stanza, you can use the following regex-type syntax:
... recurses through directories until the match is met.
*   matches anything but / 0 or more times.
|   is equivalent to 'or'
( ) are used to limit scope of |.

jcoates_splunk
Splunk Employee
Splunk Employee

I just dealt with something similar, got around it by forcing the sourcetype in inputs.conf -- not sure why what you're doing isn't working, but that worked for me.

0 Karma

emiller42
Motivator

The problem is I can't assign sourcetypes in inputs.conf due to the filenames. The wildcard filtering on inputs is much more restrictive, and prevents me from doing it there.

emiller42
Motivator

Whoops, transcriptions typo there. Those are actually .xml files, I just mistyped the filenames when providing examples. The conf stanzas are correct. Good eye! I fixed the original question.

0 Karma

kristian_kolb
Ultra Champion

Another observation:

There is a slight typo in the first two stanza regexes, 'xml' instead of 'log'. Not sure that it matters for your particular problem, though. Perhaps you just typed that here, and not in your conf files.

emiller42
Motivator

Indexer is RHEL, forwarders are Windows. I did check for ^M characters in the props.conf, and found none. running btool shows all the appropriate configs as well.

Good idea though!

0 Karma

lguinn2
Legend

Are both platforms Windows? Or is your server Linux and your laptop Windows? I ask because there could be line-ending problems. I always forget which is which, but Windows, Linux and the Mac do not use the same line endings.

Check it out on the server and make sure that the lines don't look funny.

Sorry if this is a lame suggestion, but I've been bitten by this!

Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...