Getting Data In

How to I index a payload that has leading and trailing quotes, and is delimited with pipe character?

andrewtrobec
Builder

Hello,

I have log events that follow this structure:

"2023-01-10 09:54:18.566 | ERROR | 1 | GroupManagement| ExceptionHandler | UUID CC22E78A-E62D-4693-8D89-0A54E159DDC5 | hasError | This is the error message
"

It has leading and trailing quotes, and is delimited with pipe character.  I am having trouble with creating the sourcetype and require some assistance.

My biggest issue I think is the fact that I have to remove the leading and trailing quotes so that Splunk does not treat the entire event as one field.  I seem to be able to remove them using the following sourcetype, but it does not then identify the fields:

[sourcetype]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
FIELD_DELIMITER=|
FIELD_NAMES=timestamp,type,num,area,code,uuid,text,message
TRUNCATE=20000
TIME_PREFIX=^
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%3N
SEDCMD-remove_quotes=s/(?<!,)\"([^\"]*)\"/\1/g
 
Does anybody have an idea?
 
Thank you and best regards,
 
Andrew
Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

While SEDCMD will remove the quotation marks, it's one of the last props.conf settings processed so it has little to no effect on the other settings.  See https://www.aplura.com/assets/pdf/props_conf_order.pdf for the order in which props are processed.

FTR, the FIELD_DELIMITER and FIELD_NAMES settings apply only when INDEXED_EXTRACTIONS is used.

Do you have any control over how the event is generated?  If so, can the quotes be removed?

I'd suggest using a transform, but it has the same precedence as SEDCMD.

Changing TIME_PREFIX will get the timestamp extracted.

 

TIME_PREFIX = "

 

Another option is to parse the event using REGEX.

props.conf:

[sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
CHARSET = UTF-8
disabled = false
TRUNCATE = 20000
TIME_PREFIX = "
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TRANSFORMS-parse = parseSourcetype

transforms.conf:

[parseSourcetype]
REGEX = "(?<timestamp>[^\|]+)\s?\|\s?(?<type>[^\|]+)\s?\|\s?(?<num>[^\|]+)\s?\|\s?(?<area>[^\|]+)\s?\|\s?(?<code>[^\|]+)\s?\|\s?(?<uuid>[^\|]+)\s?\|\s?(?<text>[^\|]+)\s?\|\s?(?<message>[^"]+)"

Consider using Cribl (cribl.io) to strip out the quotes before handing the events to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

andrewtrobec
Builder

@richgalloway Thank you for the response!  Am I correct in understanding that there's no way to accomplish what I'm doing while indexing, and instead I have to apply transformation at search time?  Thanks!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The transform happens at index time.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Unify Your SecOps with Splunk Mission Control

In today’s post, I'm excited to share some recent Splunk Mission Control innovations. With Splunk Mission ...

Data Preparation Made Easy: SPL2 for Edge Processor

By now, you may have heard the exciting news that Edge Processor, the easy-to-use Splunk data preparation tool ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...