Splunk Search

Setting sourcetype with a complex regex - transforms.conf

Jason
Motivator

I am using transforms.conf to pull the sourcetype from the source via a complex regex. It doesn't seem to be working, so I'm wondering if you are allowed to set sourcetype with multiple concatenated capture groups.

The regex checks the source for many items in a big OR statement, so only one-two capture groups should ever return. So, does something like $2$3$4$5$6 work?

Or is the problem that I use a backreference in the regex?

[set_sourcetype_for_applogs]
SOURCE_KEY = Metadata:Source
DEST_KEY = Metadata:Sourcetype
# regex: path/host_ then pull sourcetype from one of the following examples:
# HOST_app1_20100510000003_SOURCETYPE_1.log.1.gz => SOURCETYPE
# HOST_SOURCE1-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of number)
# HOST_instance1-SOURCE-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of instance and optional number if instance is same as SOURCE)
# HOST_SOURCETYPE.201005102301.log.1.gz => SOURCETYPE
# Is big OR statement, so can only ever be $2, $3$4, $5, or $6, so
# concatenate them all together so none are lost no matter which matches
REGEX = .*_(?:(\D+)\d?-(\1.*?)\.\d\d+|(\D+)\d(-.*?)\.\d\d+|.*_\d+_(.*)_|(.*?)\.\d\d+)
FORMAT = sourcetype::$2$3$4$5$6
Tags (1)
2 Solutions

Jason
Motivator

According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2

Someone from Splunk, please correct me if multiple is possible.

View solution in original post

0 Karma

Lowell
Super Champion

Don't you want:

[set_sourcetype_for_applogs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
....

(Note the uppercase "D" in MetaData)

Go look at your own post. 😉

Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.

View solution in original post

colinj
Path Finder

I've been working with the above in a slightly modified form. I'm collecting the logs from the directory /var/log/novell. The log names are things like /var/log/novell/foo.log, /var/log/novell/bar00.log and /var/log/novell/foo.bar.log. What I wanted to grab and use as the sourcetype was the foo, bar and foo.bar portion of the filenames respectively.
Here's what I have in transforms.conf

[set_sourcetype_for_mcommunity_logs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
REGEX = .*/novell/(\S+)(\d+)?\.log(\.\d+)?
FORMAT = sourcetype::$1

Here's what I have in props.conf

[source::.../var/log/novell/*]
TRANSFORMS-set_sourcetype = set_sourcetype_for_mcommunity_logs

navidnaddimulla
New Member

@colinj - Is your sourcetyping working with the mentioned props.conf and transforms.conf ?
Please confirm.

0 Karma

Lowell
Super Champion

Don't you want:

[set_sourcetype_for_applogs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
....

(Note the uppercase "D" in MetaData)

Go look at your own post. 😉

Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.

Lowell
Super Champion

Did some quick testing and your regex seems good. Pleas post the corresponding props.conf entries. Keep in mind that splunk doesn't do recursive sourcetype matching. For example, say your events come in with a sourcetype::temp, and then you use a transformer to reassign the sourcetype to sourcetype::my_st. After re-assigning the sourcetype, Splunk will NOT look up the [my_st] stanza for additional sourcetype-specific processing rules. In other words, an inherit limitation in re-assigning sourcetypes like this that all events must be processed based on the initial sourcetype.

0 Karma

mpflugfelder
Engager

OK, so what Lowell said above is exactly what I'm trying to accomplish. I have logs coming from a docker container, and I would like to use a regex to tell splunk that the sourcetype of that log entry is access_combined. I've setup props and a transform, and I see the source type being changed to access_combined but it's not parsing the fields. After looking at the access_combined regex, I don't want to try to figure this out myself. is there some way that I can take logs from source::whatever and based on a regex, somehow get them to be processed by the access_combined sourcetype?

I'm using the docker logging driver for splunk at this time, so I can't set the source type before it hits splunk, at least not that I'm aware of.

0 Karma

AHBrook
Path Finder

@mpflugfelder 

 

Did you ever find an answer to this? I'm running into the EXACT same scenario with my Openshift environment. Seeing that explination answers a lot about what I'm seeing, as I can't seem to get it to "re-sourcetype" my data. I do see a potential answer in CLONE_SOURCETYPE, but I am afraid that will double up the events, and I'd want to discard the original (and only if the second one contained all the metadata from the first).

0 Karma

Jason
Motivator

According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2

Someone from Splunk, please correct me if multiple is possible.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

This is possible, but only in index-time transforms, which is what you are using. Using multiple capture groups is not possible with search time extractions.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...