Splunk Search
Highlighted

How to deal with "quotes" in field values that are causing Splunk to have issues parsing field/value pairs?

Explorer

I'm having an issue with certain events that contain values with quotation marks in them. This is causing Splunk to have issues parsing the field/value pairs for the log entry. Below is a sample of what an entry might look like:

myField="Some really cool "text""

-This only occurs in one specific log line in the entire log
-Even in that specific line it does not occur every time

Because of the above two points I was trying to avoid broad actions that affect the entire sourcetype. For example, my understanding is I could turn off auto-extractions for the sourcetype, but that would cause issues with many logs that are working just fine.

Highlighted

Re: How to deal with "quotes" in field values that are causing Splunk to have issues parsing field/value pairs?

Splunk Employee
Splunk Employee

How are the fields extracted? Can you provide more information? what exactly is the "parsing issue" ? There are lots of ways of going creating a solution to this problem but we need more information.

0 Karma
Highlighted

Re: How to deal with "quotes" in field values that are causing Splunk to have issues parsing field/value pairs?

Contributor

Indexed extractions may be what you need.

From: http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/Configureindex-timefieldextraction

Concatenate field values from event segments at index time

This example shows you how an index-time transform can be used to extract separate segments of an event and combine them to create a single field, using the FORMAT option.

Let's say you have the following event:

20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 127.0.0.0 8226 R Q [0084 A NOERROR] A (4)www(8)google(3)com(0)

Now, what you want to do is get (4)www(8)google(3)com(0) extracted as a value of a field named dns_requestor. But you don't want those garbage parentheses and numerals, you just want something that looks like www.google.com. How do you achieve this?
transforms.conf

You would start by setting up a transform in transforms.conf named dnsRequest:

[dnsRequest]
REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+)
FORMAT = dns_requestor::$1.$2.$3 

This transform defines a custom field named dnsrequestor. It uses its REGEX to pull out the three segments of the dnsrequestor value. Then it uses FORMAT to order those segments with periods between them, like a proper URL.

Note: This method of concatenating event segments into a complete field value is something you can only perform with index-time extractions; search-time extractions have practical restrictions that prevent it. If you find that you must use FORMAT in this manner, you will have to create a new indexed field to do it.
props.conf

Then, the next step would be to define a field extraction in props.conf that references the dnsRequest transform and applies it to events coming from the server1 source type:

[server1]
TRANSFORMS-dnsExtract = dnsRequest

fields.conf

Finally, you would enter the following stanza in fields.conf:

[dns_requestor]
INDEXED = true

Restart Splunk for your configuration file changes to take effect.