I have log events that follow this structure:
"2023-01-10 09:54:18.566 | ERROR | 1 | GroupManagement| ExceptionHandler | UUID CC22E78A-E62D-4693-8D89-0A54E159DDC5 | hasError | This is the error message
It has leading and trailing quotes, and is delimited with pipe character. I am having trouble with creating the sourcetype and require some assistance.
My biggest issue I think is the fact that I have to remove the leading and trailing quotes so that Splunk does not treat the entire event as one field. I seem to be able to remove them using the following sourcetype, but it does not then identify the fields:
While SEDCMD will remove the quotation marks, it's one of the last props.conf settings processed so it has little to no effect on the other settings. See https://www.aplura.com/assets/pdf/props_conf_order.pdf for the order in which props are processed.
FTR, the FIELD_DELIMITER and FIELD_NAMES settings apply only when INDEXED_EXTRACTIONS is used.
Do you have any control over how the event is generated? If so, can the quotes be removed?
I'd suggest using a transform, but it has the same precedence as SEDCMD.
Changing TIME_PREFIX will get the timestamp extracted.
TIME_PREFIX = "
Another option is to parse the event using REGEX.
[sourcetype] SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+) NO_BINARY_CHECK = true CHARSET = UTF-8 disabled = false TRUNCATE = 20000 TIME_PREFIX = " TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N TRANSFORMS-parse = parseSourcetype
[parseSourcetype] REGEX = "(?<timestamp>[^\|]+)\s?\|\s?(?<type>[^\|]+)\s?\|\s?(?<num>[^\|]+)\s?\|\s?(?<area>[^\|]+)\s?\|\s?(?<code>[^\|]+)\s?\|\s?(?<uuid>[^\|]+)\s?\|\s?(?<text>[^\|]+)\s?\|\s?(?<message>[^"]+)"
Consider using Cribl (cribl.io) to strip out the quotes before handing the events to Splunk.
@richgalloway Thank you for the response! Am I correct in understanding that there's no way to accomplish what I'm doing while indexing, and instead I have to apply transformation at search time? Thanks!
The transform happens at index time.