Splunk Search
Highlighted

Setting sourcetype with a complex regex - transforms.conf

Motivator

I am using transforms.conf to pull the sourcetype from the source via a complex regex. It doesn't seem to be working, so I'm wondering if you are allowed to set sourcetype with multiple concatenated capture groups.

The regex checks the source for many items in a big OR statement, so only one-two capture groups should ever return. So, does something like $2$3$4$5$6 work?

Or is the problem that I use a backreference in the regex?

[set_sourcetype_for_applogs]
SOURCE_KEY = Metadata:Source
DEST_KEY = Metadata:Sourcetype
# regex: path/host_ then pull sourcetype from one of the following examples:
# HOST_app1_20100510000003_SOURCETYPE_1.log.1.gz => SOURCETYPE
# HOST_SOURCE1-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of number)
# HOST_instance1-SOURCE-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of instance and optional number if instance is same as SOURCE)
# HOST_SOURCETYPE.201005102301.log.1.gz => SOURCETYPE
# Is big OR statement, so can only ever be $2, $3$4, $5, or $6, so
# concatenate them all together so none are lost no matter which matches
REGEX = .*_(?:(\D+)\d?-(\1.*?)\.\d\d+|(\D+)\d(-.*?)\.\d\d+|.*_\d+_(.*)_|(.*?)\.\d\d+)
FORMAT = sourcetype::$2$3$4$5$6
Tags (1)
Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Motivator

According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2

Someone from Splunk, please correct me if multiple is possible.

View solution in original post

0 Karma
Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Splunk Employee
Splunk Employee

This is possible, but only in index-time transforms, which is what you are using. Using multiple capture groups is not possible with search time extractions.

Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Super Champion

Did some quick testing and your regex seems good. Pleas post the corresponding props.conf entries. Keep in mind that splunk doesn't do recursive sourcetype matching. For example, say your events come in with a sourcetype::temp, and then you use a transformer to reassign the sourcetype to sourcetype::my_st. After re-assigning the sourcetype, Splunk will NOT look up the [my_st] stanza for additional sourcetype-specific processing rules. In other words, an inherit limitation in re-assigning sourcetypes like this that all events must be processed based on the initial sourcetype.

0 Karma
Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Engager

OK, so what Lowell said above is exactly what I'm trying to accomplish. I have logs coming from a docker container, and I would like to use a regex to tell splunk that the sourcetype of that log entry is accesscombined. I've setup props and a transform, and I see the source type being changed to accesscombined but it's not parsing the fields. After looking at the accesscombined regex, I don't want to try to figure this out myself. is there some way that I can take logs from source::whatever and based on a regex, somehow get them to be processed by the accesscombined sourcetype?

I'm using the docker logging driver for splunk at this time, so I can't set the source type before it hits splunk, at least not that I'm aware of.

0 Karma
Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Super Champion

Don't you want:

[set_sourcetype_for_applogs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
....

(Note the uppercase "D" in MetaData)

Go look at your own post. 😉

Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.

View solution in original post

Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

Path Finder

I've been working with the above in a slightly modified form. I'm collecting the logs from the directory /var/log/novell. The log names are things like /var/log/novell/foo.log, /var/log/novell/bar00.log and /var/log/novell/foo.bar.log. What I wanted to grab and use as the sourcetype was the foo, bar and foo.bar portion of the filenames respectively.
Here's what I have in transforms.conf

[set_sourcetype_for_mcommunity_logs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
REGEX = .*/novell/(\S+)(\d+)?\.log(\.\d+)?
FORMAT = sourcetype::$1

Here's what I have in props.conf

[source::.../var/log/novell/*]
TRANSFORMS-set_sourcetype = set_sourcetype_for_mcommunity_logs
Highlighted

Re: Setting sourcetype with a complex regex - transforms.conf

New Member

@colinj - Is your sourcetyping working with the mentioned props.conf and transforms.conf ?
Please confirm.

0 Karma