Splunk Search
Highlighted

How to configure source_type to regex value?

Explorer

I'm using a Universal Forwarder and want Splunk to return source_type as what's defined for source within the monitor path(/.../).
Please see my configuration files below.

Inputs.conf

#apache
[monitor:///apps/web/test/sfagent/.../*.log.*]
sourcetype = replace_sourcetype_with_segment_5_from_source
blacklist = \.(zip|gz)$
index = web

Transforms.conf

[replace_sourcetype_with_segment_5_from_source]
SOURCE_KEY = MetaData:Source
REGEX = ^source::(?:/[^/]+){4}/([^/]+)/
FORMAT = sourcetype::replace_sourcetype_with_segment_5_from_source
DEST_KEY = MetaData:Sourcetype   

Props.conf

[replace_sourcetype_with_segment_5_from_source]
TRANSFORMS-replaceSourcetype = replace_sourcetype_with_segment_5_from_source

After some digging I discovered I can't use transforms.conf on a Universal Forwarder. Is this absolutely true?
If yes, is there a way to get Splunk to grab the source_type from the metadata and display it within the monitor path without using a heavy forwarder?

0 Karma
Highlighted

Re: How to configure source_type to regex value?

SplunkTrust
SplunkTrust

Universal forwarders DO NOT parse the data, so there can be no transforms on the UF.

As to the second part of your question, I don't think that you can, particularly on the UF. If you want to do such a think on the indexer, then that is where you should do it (or on an intermediate heavy forwarder). The UF's are not particularly helpful in looking at the data and performing any sorts of manipulation (except in the Windows Event Logs where they have made a particular use case for doing some parsing of that data).

Highlighted

Re: How to configure source_type to regex value?

Builder

Can you mention whats the use case here. Knowing your end goal, the community can try to suggest you a better way around.

The intent of sourcetype is for it to identify the data structure of an event. And it determines how Splunk Enterprise should format the data during its indexing process. Trying to set a dynamic sourcetype defeats this purpose.

I'd rather suggest you use a search time extraction OR "tags" if you want to associate some meaningful info with your events based on source.

Highlighted

Re: How to configure source_type to regex value?

Explorer

My end goal is to extract the sourcetype and index with a regex from the monitor path at runtime based on a lookup from the directory structure.

For example in the case of apache
actual monitor path will look like:
/apps/apache/http/access/http-access.log
OR
/apps/nginx/http/access/http-error.log

input.conf

#apache or nginx
 [monitor:///apps/.../.../.../*.log.*]
 sourcetype = ( REGEX = ^source::(?:/[^/]+){1}/([^/]+)/ 😞 ( REGEX = ^source::(?:/[^/]+){2}/([^/]+)/ )
 index =  (REGEX = ^source::(?:/[^/]+){0}/([^/]+)/ )
 blacklist = \.(zip|gz)$

Desired output:

Splunk sends all apache access logs from /app/apache/http/access/http-access.log with index=apache and sourcetype = http:access
and splunk also sends all nginx error logs from /apps/nginx/http/error/http-error.log with index=nginx and sourcetype=http:error

0 Karma
Highlighted

Re: How to configure source_type to regex value?

Motivator

You would probably be better off explicitly setting the index and sourcetype for each path as a separate input.

0 Karma
Highlighted

Re: How to configure source_type to regex value?

Esteemed Legend

You cannot do this on the UF but you can on HF/Indexer like this:

props.conf:

[source:///apps/web/test/sfagent/.../*.log.*]
TRANSFORMS-replaceSourcetype = replace_sourcetype_with_segment_5_from_source

transforms.conf:

[replace_sourcetype_with_segment_5_from_source]
SOURCE_KEY = MetaData:Source
REGEX = source::(?:/[^/]+){4}/([^/]+)/
FORMAT = sourcetype::$1
DEST_KEY = MetaData:Sourcetype

If you are doing a sourcetype override/overwrite, you must use the ORIGINAL values NOT the new value, then you must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF-tier, if you use this, or your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using _index_earliest=-5m to be absolutely certain that you are only examining the newly indexed events.

View solution in original post