Our Splunk server receives data via syslog. As a result, I need to transform the syslog data using transforms.conf and props.conf (Details in the question "Why does Splunk not recognize standard fields in my Apache data forwarded by syslog?".
My question, can I transform the data and still do some field extraction on that data? I would like to preserve the process
field. However, the default transform simply strips out the data. It doesn't save any of the fields.
So, given the following transformation in local/props.conf
:
[syslog]
TRANSFORMS-strip-syslog-header = syslog-header-stripper-ts-host-proc
And this default transform from default/transforms.conf
:
# This will strip out date stamp, host, process with pid and just get the
# actual message
[syslog-header-stripper-ts-host-proc]
REGEX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s.*?:\s(.*)$
FORMAT = $1
DEST_KEY = _raw
Can I somehow preserve one of the fields and save it to the name of process
?
I have had some luck with the following pattern, saved at https://www.regex101.com/r/iK8iX5/1 . However, I am uncertain how to use this in a Splunk Transform.
^(?<SyslogPri><\d+>)(?<SyslogDate>[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+)\s(?<SyslogHost>.*)\s(?<process>.*):\s(?<SyslogMessage>.*)$
Hello @stefanlasiewski,
Your regex statement will work just find by simply adding it to the REGEX settings in the transforms.conf. I do what you are doing all the time.
[syslog-header-stripper-ts-host-proc]
REGEX = yourRegex statement
This will working for search search time extraction, but are you trying to create an Index time extract? In your example it seems like you are trying to overwrite the _raw data.
I don't really care where this extraction is happening. I'm fine with anywhere, as long as it's fast and easy to do. I just want to use the fields. I'm only using the _raw data because that's what the Splunks docs suggest, and I'm using the default Transform named syslog-header-stripper-ts-host-proc
from default/transforms.conf
.
Can you show me an example that you use for the FORMAT
and DEST_KEY
? I'm confused by how those should be used.
You can't at the index time, but at search time as FORMAT take multi name-value pairs.
FORMAT =
* NOTE: This option is valid for both index-time and search-time field extraction. However, FORMAT
behaves differently depending on whether the extraction is performed at index time or
search time.
* This attribute specifies the format of the event, including any field names or values you want
to add.
* FORMAT for index-time extractions:
* Use $n (for example $1, $2, etc) to specify the output of each REGEX match.
* If REGEX does not have n groups, the matching fails.
* The special identifier $0 represents what was in the DEST_KEY before the REGEX was performed.
* At index time only, you can use FORMAT to create concatenated fields:
* FORMAT = ipaddress::$1.$2.$3.$4
* When you create concatenated fields with FORMAT, "$" is the only special character. It is
treated as a prefix for regex-capturing groups only if it is followed by a number and only
if the number applies to an existing capturing group. So if REGEX has only one capturing
group and its value is "bar", then:
* "FORMAT = foo$1" yields "foobar"
* "FORMAT = foo$bar" yields "foo$bar"
* "FORMAT = foo$1234" yields "foo$1234"
* "FORMAT = foo$1\$2" yields "foobar\$2"
* At index-time, FORMAT defaults to
* FORMAT for search-time extractions:
* The format of this field as used during search time extractions is as follows:
* FORMAT =
* where:
* field-name = [
* field-value = [
* Search-time extraction examples:
* 1. FORMAT = first::$1 second::$2 third::other-value
* 2. FORMAT = $1::$2
* If the key-name of a FORMAT setting is varying, for example $1 in the
example 2 just above, then the regex will continue to match against
the source key to extract as many matches as are present in the text.
* NOTE: You cannot create concatenated fields with FORMAT at search time. That
functionality is only available at index time.
* At search-time, FORMAT defaults to an empty string