I'm having an issue with certain events that contain values with quotation marks in them. This is causing Splunk to have issues parsing the field/value pairs for the log entry. Below is a sample of what an entry might look like:
myField="Some really cool "text""
-This only occurs in one specific log line in the entire log
-Even in that specific line it does not occur every time
Because of the above two points I was trying to avoid broad actions that affect the entire sourcetype. For example, my understanding is I could turn off auto-extractions for the sourcetype, but that would cause issues with many logs that are working just fine.
How are the fields extracted? Can you provide more information? what exactly is the "parsing issue" ? There are lots of ways of going creating a solution to this problem but we need more information.
Indexed extractions may be what you need.
Concatenate field values from event segments at index time
This example shows you how an index-time transform can be used to extract separate segments of an event and combine them to create a single field, using the FORMAT option.
Let's say you have the following event:
20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 127.0.0.0 8226 R Q [0084 A NOERROR] A (4)www(8)google(3)com(0)
Now, what you want to do is get (4)www(8)google(3)com(0) extracted as a value of a field named dns_requestor. But you don't want those garbage parentheses and numerals, you just want something that looks like www.google.com. How do you achieve this?
You would start by setting up a transform in transforms.conf named dnsRequest:
[dnsRequest] REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+) FORMAT = dns_requestor::$1.$2.$3
This transform defines a custom field named dnsrequestor. It uses its REGEX to pull out the three segments of the dnsrequestor value. Then it uses FORMAT to order those segments with periods between them, like a proper URL.
Note: This method of concatenating event segments into a complete field value is something you can only perform with index-time extractions; search-time extractions have practical restrictions that prevent it. If you find that you must use FORMAT in this manner, you will have to create a new indexed field to do it.
Then, the next step would be to define a field extraction in props.conf that references the dnsRequest transform and applies it to events coming from the server1 source type:
[server1] TRANSFORMS-dnsExtract = dnsRequest
Finally, you would enter the following stanza in fields.conf:
[dns_requestor] INDEXED = true
Restart Splunk for your configuration file changes to take effect.