Getting Data In

How to I index a payload that has leading and trailing quotes, and is delimited with pipe character?

andrewtrobec
Motivator

Hello,

I have log events that follow this structure:

"2023-01-10 09:54:18.566 | ERROR | 1 | GroupManagement| ExceptionHandler | UUID CC22E78A-E62D-4693-8D89-0A54E159DDC5 | hasError | This is the error message
"

It has leading and trailing quotes, and is delimited with pipe character.  I am having trouble with creating the sourcetype and require some assistance.

My biggest issue I think is the fact that I have to remove the leading and trailing quotes so that Splunk does not treat the entire event as one field.  I seem to be able to remove them using the following sourcetype, but it does not then identify the fields:

[sourcetype]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
FIELD_DELIMITER=|
FIELD_NAMES=timestamp,type,num,area,code,uuid,text,message
TRUNCATE=20000
TIME_PREFIX=^
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%3N
SEDCMD-remove_quotes=s/(?<!,)\"([^\"]*)\"/\1/g
 
Does anybody have an idea?
 
Thank you and best regards,
 
Andrew
Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

While SEDCMD will remove the quotation marks, it's one of the last props.conf settings processed so it has little to no effect on the other settings.  See https://www.aplura.com/assets/pdf/props_conf_order.pdf for the order in which props are processed.

FTR, the FIELD_DELIMITER and FIELD_NAMES settings apply only when INDEXED_EXTRACTIONS is used.

Do you have any control over how the event is generated?  If so, can the quotes be removed?

I'd suggest using a transform, but it has the same precedence as SEDCMD.

Changing TIME_PREFIX will get the timestamp extracted.

 

TIME_PREFIX = "

 

Another option is to parse the event using REGEX.

props.conf:

[sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
CHARSET = UTF-8
disabled = false
TRUNCATE = 20000
TIME_PREFIX = "
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TRANSFORMS-parse = parseSourcetype

transforms.conf:

[parseSourcetype]
REGEX = "(?<timestamp>[^\|]+)\s?\|\s?(?<type>[^\|]+)\s?\|\s?(?<num>[^\|]+)\s?\|\s?(?<area>[^\|]+)\s?\|\s?(?<code>[^\|]+)\s?\|\s?(?<uuid>[^\|]+)\s?\|\s?(?<text>[^\|]+)\s?\|\s?(?<message>[^"]+)"

Consider using Cribl (cribl.io) to strip out the quotes before handing the events to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

andrewtrobec
Motivator

@richgalloway Thank you for the response!  Am I correct in understanding that there's no way to accomplish what I'm doing while indexing, and instead I have to apply transformation at search time?  Thanks!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The transform happens at index time.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...