Getting Data In

How to I index a payload that has leading and trailing quotes, and is delimited with pipe character?

andrewtrobec
Motivator

Hello,

I have log events that follow this structure:

"2023-01-10 09:54:18.566 | ERROR | 1 | GroupManagement| ExceptionHandler | UUID CC22E78A-E62D-4693-8D89-0A54E159DDC5 | hasError | This is the error message
"

It has leading and trailing quotes, and is delimited with pipe character.  I am having trouble with creating the sourcetype and require some assistance.

My biggest issue I think is the fact that I have to remove the leading and trailing quotes so that Splunk does not treat the entire event as one field.  I seem to be able to remove them using the following sourcetype, but it does not then identify the fields:

[sourcetype]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
FIELD_DELIMITER=|
FIELD_NAMES=timestamp,type,num,area,code,uuid,text,message
TRUNCATE=20000
TIME_PREFIX=^
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%3N
SEDCMD-remove_quotes=s/(?<!,)\"([^\"]*)\"/\1/g
 
Does anybody have an idea?
 
Thank you and best regards,
 
Andrew
Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

While SEDCMD will remove the quotation marks, it's one of the last props.conf settings processed so it has little to no effect on the other settings.  See https://www.aplura.com/assets/pdf/props_conf_order.pdf for the order in which props are processed.

FTR, the FIELD_DELIMITER and FIELD_NAMES settings apply only when INDEXED_EXTRACTIONS is used.

Do you have any control over how the event is generated?  If so, can the quotes be removed?

I'd suggest using a transform, but it has the same precedence as SEDCMD.

Changing TIME_PREFIX will get the timestamp extracted.

 

TIME_PREFIX = "

 

Another option is to parse the event using REGEX.

props.conf:

[sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
CHARSET = UTF-8
disabled = false
TRUNCATE = 20000
TIME_PREFIX = "
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TRANSFORMS-parse = parseSourcetype

transforms.conf:

[parseSourcetype]
REGEX = "(?<timestamp>[^\|]+)\s?\|\s?(?<type>[^\|]+)\s?\|\s?(?<num>[^\|]+)\s?\|\s?(?<area>[^\|]+)\s?\|\s?(?<code>[^\|]+)\s?\|\s?(?<uuid>[^\|]+)\s?\|\s?(?<text>[^\|]+)\s?\|\s?(?<message>[^"]+)"

Consider using Cribl (cribl.io) to strip out the quotes before handing the events to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

andrewtrobec
Motivator

@richgalloway Thank you for the response!  Am I correct in understanding that there's no way to accomplish what I'm doing while indexing, and instead I have to apply transformation at search time?  Thanks!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The transform happens at index time.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...