I am using transforms.conf to pull the sourcetype from the source via a complex regex. It doesn't seem to be working, so I'm wondering if you are allowed to set sourcetype with multiple concatenated capture groups.
The regex checks the source for many items in a big OR statement, so only one-two capture groups should ever return. So, does something like $2$3$4$5$6 work?
Or is the problem that I use a backreference in the regex?
[set_sourcetype_for_applogs]
SOURCE_KEY = Metadata:Source
DEST_KEY = Metadata:Sourcetype
# regex: path/host_ then pull sourcetype from one of the following examples:
# HOST_app1_20100510000003_SOURCETYPE_1.log.1.gz => SOURCETYPE
# HOST_SOURCE1-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of number)
# HOST_instance1-SOURCE-TYPE.201005100001.log.1.gz => SOURCE-TYPE (removal of instance and optional number if instance is same as SOURCE)
# HOST_SOURCETYPE.201005102301.log.1.gz => SOURCETYPE
# Is big OR statement, so can only ever be $2, $3$4, $5, or $6, so
# concatenate them all together so none are lost no matter which matches
REGEX = .*_(?:(\D+)\d?-(\1.*?)\.\d\d+|(\D+)\d(-.*?)\.\d\d+|.*_\d+_(.*)_|(.*?)\.\d\d+)
FORMAT = sourcetype::$2$3$4$5$6
According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2
Someone from Splunk, please correct me if multiple is possible.
Don't you want:
[set_sourcetype_for_applogs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
....
(Note the uppercase "D" in MetaData
)
Go look at your own post. 😉
Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.
I've been working with the above in a slightly modified form. I'm collecting the logs from the directory /var/log/novell. The log names are things like /var/log/novell/foo.log, /var/log/novell/bar00.log and /var/log/novell/foo.bar.log. What I wanted to grab and use as the sourcetype was the foo, bar and foo.bar portion of the filenames respectively.
Here's what I have in transforms.conf
[set_sourcetype_for_mcommunity_logs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
REGEX = .*/novell/(\S+)(\d+)?\.log(\.\d+)?
FORMAT = sourcetype::$1
Here's what I have in props.conf
[source::.../var/log/novell/*]
TRANSFORMS-set_sourcetype = set_sourcetype_for_mcommunity_logs
@colinj - Is your sourcetyping working with the mentioned props.conf and transforms.conf ?
Please confirm.
Don't you want:
[set_sourcetype_for_applogs]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
....
(Note the uppercase "D" in MetaData
)
Go look at your own post. 😉
Don't you just hate it when you miss that kind of stuff. What I wouldn't give for some sort of validating parser... The funny thing is that I looked to see if you had the case correct, and I missed it too.
Did some quick testing and your regex seems good. Pleas post the corresponding props.conf
entries. Keep in mind that splunk doesn't do recursive sourcetype matching. For example, say your events come in with a sourcetype::temp
, and then you use a transformer to reassign the sourcetype to sourcetype::my_st
. After re-assigning the sourcetype, Splunk will NOT look up the [my_st]
stanza for additional sourcetype-specific processing rules. In other words, an inherit limitation in re-assigning sourcetypes like this that all events must be processed based on the initial sourcetype.
OK, so what Lowell said above is exactly what I'm trying to accomplish. I have logs coming from a docker container, and I would like to use a regex to tell splunk that the sourcetype of that log entry is access_combined. I've setup props and a transform, and I see the source type being changed to access_combined but it's not parsing the fields. After looking at the access_combined regex, I don't want to try to figure this out myself. is there some way that I can take logs from source::whatever and based on a regex, somehow get them to be processed by the access_combined sourcetype?
I'm using the docker logging driver for splunk at this time, so I can't set the source type before it hits splunk, at least not that I'm aware of.
Did you ever find an answer to this? I'm running into the EXACT same scenario with my Openshift environment. Seeing that explination answers a lot about what I'm seeing, as I can't seem to get it to "re-sourcetype" my data. I do see a potential answer in CLONE_SOURCETYPE, but I am afraid that will double up the events, and I'd want to discard the original (and only if the second one contained all the metadata from the first).
According to my teammates, this is not possible - that you must use a single capture group only: FORMAT = $2
Someone from Splunk, please correct me if multiple is possible.
This is possible, but only in index-time transforms, which is what you are using. Using multiple capture groups is not possible with search time extractions.