Splunk Search

Wildcards with inputs.conf

Builder

I've read over documentation with inputs.conf and was wondering if I have the correct solution to this issue.

On many of our forwarders we run applications that generate logs that are all located in the same directory EXCEPT that one segment of the path this directory will be different. Example:

/opt/log/dotinfo/epp_server/epp_server.log
/opt/log/dotasia/epp_server/epp_server.log
/opt/log/dotorg/epp_server/epp_server.log

The third segment of the directory path I want to insert a wildcard to say "any". All of the logs in this path will be sourcetype=EPP.

So I'm wondering, would I be able to put the following in inputs.conf on all the forwarders:

[monitor:///opt/log/.../epp_server/epp_server.log]
disabled = false
sourcetype = EPP

I believe this would work. Thoughts?

Tags (1)
1 Solution

Motivator

Generally that should work, but I'd use:

[monitor:///opt/log/*/epp_server/epp_server.log]
disabled = false
sourcetype = EPP
host_segment = 3

The ... wildcard can match multiple directory levels. * will match only within one.

Adding host_segment will pull the hostname out of the file path, assuming that's what you actually want to do.

View solution in original post

Super Champion

Are you going to be monitoring multiple files under /opt/log? If so, then I would suggest a slightly different approach.

Instead of hardcoding the sourcetype in your monitor stanza, you can use a props entry to establish your sourcetype based on file path patten. This way you can have a single monitor looking for a number of different types of files at once, and the correct sourcetype should be independently for your different types of files.

Prior to 4.1, the approach I'm suggesting really was your only option. But in 4.1, you can now have nested monitor stanzas which removes the previous limitation; however, I still think there are advantages to this setup.

props.conf:

[source::.../epp_server/epp_server.log]
sourcetype = EPP

[EPP]
# Whatever specific sourcetype settings that you need (TIME_FORMAT, SHOULD_LINEMERGE, ...)

inputs.conf:

[monitor:///opt/log]
whitelist = (/log/\w+/epp_server/epp_server\.log|regex2|regex3)$
host_regex = ^/opt/log/([^\]+)/

Here you would replace "regex2" and "regex3" with patterns that would match whatever other paths you would like to monitor under this directory structure at the same time. If you have a very small number of files in this folder, you could easily just setup different source matching prop stanzas and not need a whitelist at all.

If you only need to blacklist a few files, you can create a "blacklist" entry in your inputs.conf entry, or your can do this using props.conf path-based rule as well. The advantage of using the props-based approach is that it's automatically inherited by all of your inputs; you don't necessary have to update the "blacklist" on multiple monitor stanzas. The trick here is to use the special sourcetype called ignored_type which will prevent the matching file from being indexed. Here's and example that blocks files with the "\.old" extension:

[source::.../epp_server/*.old]
sourcetype = ignored_type
0 Karma

Super Champion

By different types of files I'm referring to sourcetype, not different file extensions. If you want some files to be sourcetype=WEB and other to be sourcetype=EPP, then you have multiple "types" of files; then the approach above should scale well for you.

0 Karma

Builder

btw we won't be monitoring multiple types of files though. All files end in .log and after the current log file rotates, the file will change to like epp_server.log.1. there are no other types of log files (.txt for example) in the directories

0 Karma

Builder

That might be a good idea considering we have web logs and some whois logs in the same location. for example:

[monitor:///opt/log/*/web_server/web_server.log]
disabled = false
sourcetype = WEB

I was only focusing on the EPP server logs at first to get an idea of what my options were.

0 Karma

Super Champion

Keep in mind that monitoring like this will still scan many subdirectories under /opt/log, so if there are many files and directories under this directory even though only a few files will match the pattern.

0 Karma

Path Finder

You could try to monitor the /opt/log folder and then use a whitelist and host_regex to focus only on the log files in the correct folders and assign the correct host to each log file. For example:

[monitor:///opt/log]
whitelist = \/epp_server\/epp_server\.log$
sourcetype = EPP
host_regex = \/log\/(.+)\/epp_server\/epp_server\.log$
0 Karma

Super Champion

Note that your config will allow matching arbitrarily-nested directories. See southeringtonp's note about ... vs *. Also, and this is very minor, so FYI: you technically don't need to escape the forward slash ("/") character. People often do because you see that frequently in a sed-like expression and and PERL regexes, but it's not actually necessary. That said, it doesn't hurt anything to have them there if you like extra back-slashes for whatever reason. 😉

Motivator

Generally that should work, but I'd use:

[monitor:///opt/log/*/epp_server/epp_server.log]
disabled = false
sourcetype = EPP
host_segment = 3

The ... wildcard can match multiple directory levels. * will match only within one.

Adding host_segment will pull the hostname out of the file path, assuming that's what you actually want to do.

View solution in original post

Builder

the log path doesn't have a hostname in it. the host is not ccltd or flexreg btw. that's just the log path. the host name is no where in the actual log path at all.

0 Karma

Motivator

It should let you add more than one wildcard if you need to add another directory level, but only one segment can be used for extracting the hostname. If you add a second host_segment line, it will just override the previous one. To get both, you'd need to go back to using host_regex. Even then, it would extract as 'cctld/flexreg', and not as a fqdn like 'flexreg.cctld'. If you want the host to be extracted as a fqdn, and the fqdn is in the body of each event, you could get it with an index-time extraction in transforms.conf and props.conf instead of going from the pathname.

0 Karma

Builder

I wanted to ask how this would work as I have found a directory path for a log that would require segments 3 and 4 to be a wildcard. An example log path is: monitor:///opt/log/cctld/flexreg/epp_server/epp_server.log. Would I be able to specify more then one segment to pull out of the file path? Can I add host_segment = 4 as well to the example you gave? Just wondering....

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!