Splunk Search

How to deal with "quotes" in field values that are causing Splunk to have issues parsing field/value pairs?

BT_Neophyte
Explorer

I'm having an issue with certain events that contain values with quotation marks in them. This is causing Splunk to have issues parsing the field/value pairs for the log entry. Below is a sample of what an entry might look like:

myField="Some really cool "text""

-This only occurs in one specific log line in the entire log
-Even in that specific line it does not occur every time

Because of the above two points I was trying to avoid broad actions that affect the entire sourcetype. For example, my understanding is I could turn off auto-extractions for the sourcetype, but that would cause issues with many logs that are working just fine.

dolivasoh
Contributor

Indexed extractions may be what you need.

From: http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/Configureindex-timefieldextraction

Concatenate field values from event segments at index time

This example shows you how an index-time transform can be used to extract separate segments of an event and combine them to create a single field, using the FORMAT option.

Let's say you have the following event:

20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 127.0.0.0 8226 R Q [0084 A NOERROR] A (4)www(8)google(3)com(0)

Now, what you want to do is get (4)www(8)google(3)com(0) extracted as a value of a field named dns_requestor. But you don't want those garbage parentheses and numerals, you just want something that looks like www.google.com. How do you achieve this?
transforms.conf

You would start by setting up a transform in transforms.conf named dnsRequest:

[dnsRequest]
REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+)
FORMAT = dns_requestor::$1.$2.$3 

This transform defines a custom field named dns_requestor. It uses its REGEX to pull out the three segments of the dns_requestor value. Then it uses FORMAT to order those segments with periods between them, like a proper URL.

Note: This method of concatenating event segments into a complete field value is something you can only perform with index-time extractions; search-time extractions have practical restrictions that prevent it. If you find that you must use FORMAT in this manner, you will have to create a new indexed field to do it.
props.conf

Then, the next step would be to define a field extraction in props.conf that references the dnsRequest transform and applies it to events coming from the server1 source type:

[server1]
TRANSFORMS-dnsExtract = dnsRequest

fields.conf

Finally, you would enter the following stanza in fields.conf:

[dns_requestor]
INDEXED = true

Restart Splunk for your configuration file changes to take effect.

aljohnson_splun
Splunk Employee
Splunk Employee

How are the fields extracted? Can you provide more information? what exactly is the "parsing issue" ? There are lots of ways of going creating a solution to this problem but we need more information.

0 Karma
Get Updates on the Splunk Community!

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...