I am using transforms.conf to pull the sourcetype from the source via a complex regex. It doesn't seem to be working, so I'm wondering if you are allowed to set sourcetype with multiple concatenated capture groups.
The regex checks the source for many items in a big OR statement, so only one-two capture groups should ever return. So, does something like $2$3$4$5$6 work?
Or is the problem that I use a backreference in the regex?
[set_sourcetype_for_applogs] SOURCE_KEY = Metadata:Source DEST_KEY = Metadata:Sourcetype # regex: path/host_ then pull sourcetype from one of the following examples: # HOST_app1_20100510000003_SOURCETYPE_1.log.1.gz => SOURCETYPE # HOST_SOURCE1-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of number) # HOST_instance1-SOURCE-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of instance and optional number if instance is same as SOURCE) # HOST_SOURCETYPE.201005102301.log.1.gz => SOURCETYPE # Is big OR statement, so can only ever be $2, $3$4, $5, or $6, so # concatenate them all together so none are lost no matter which matches REGEX = .*_(?:(\D+)\d?-(\1.*?)\.\d\d+|(\D+)\d(-.*?)\.\d\d+|.*_\d+_(.*)_|(.*?)\.\d\d+) FORMAT = sourcetype::$2$3$4$5$6
According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2
Someone from Splunk, please correct me if multiple is possible.
This is possible, but only in index-time transforms, which is what you are using. Using multiple capture groups is not possible with search time extractions.
Did some quick testing and your regex seems good. Pleas post the corresponding
props.conf entries. Keep in mind that splunk doesn't do recursive sourcetype matching. For example, say your events come in with a
sourcetype::temp, and then you use a transformer to reassign the sourcetype to
sourcetype::my_st. After re-assigning the sourcetype, Splunk will NOT look up the
[my_st] stanza for additional sourcetype-specific processing rules. In other words, an inherit limitation in re-assigning sourcetypes like this that all events must be processed based on the initial sourcetype.
OK, so what Lowell said above is exactly what I'm trying to accomplish. I have logs coming from a docker container, and I would like to use a regex to tell splunk that the sourcetype of that log entry is accesscombined. I've setup props and a transform, and I see the source type being changed to accesscombined but it's not parsing the fields. After looking at the accesscombined regex, I don't want to try to figure this out myself. is there some way that I can take logs from source::whatever and based on a regex, somehow get them to be processed by the accesscombined sourcetype?
I'm using the docker logging driver for splunk at this time, so I can't set the source type before it hits splunk, at least not that I'm aware of.
Don't you want:
[set_sourcetype_for_applogs] SOURCE_KEY = MetaData:Source DEST_KEY = MetaData:Sourcetype ....
(Note the uppercase "D" in
Go look at your own post. 😉
Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.
I've been working with the above in a slightly modified form. I'm collecting the logs from the directory /var/log/novell. The log names are things like /var/log/novell/foo.log, /var/log/novell/bar00.log and /var/log/novell/foo.bar.log. What I wanted to grab and use as the sourcetype was the foo, bar and foo.bar portion of the filenames respectively.
Here's what I have in transforms.conf
[set_sourcetype_for_mcommunity_logs] SOURCE_KEY = MetaData:Source DEST_KEY = MetaData:Sourcetype REGEX = .*/novell/(\S+)(\d+)?\.log(\.\d+)? FORMAT = sourcetype::$1
Here's what I have in props.conf
[source::.../var/log/novell/*] TRANSFORMS-set_sourcetype = set_sourcetype_for_mcommunity_logs
@colinj - Is your sourcetyping working with the mentioned props.conf and transforms.conf ?