Splunk Search
Highlighted

How to properly handle backslashes in data

Path Finder

We use the HttpEventListener to input data into splunk. Our data is pipe ('|') delimited and we have setup field extractors using the "delimited" method. This works perfectly so long as there are no backslashes or pipes in our data. If there are backslashes or pipes, we've observed some odd behavior in Splunk.

  1. If there is a field that contains a double backslash, just as part of the data, such as could occur if we were to log a windows UNC path (i.e. "The network path path is: \serverA\folder1\subfolder"), in the "_raw" view, the data will look correct, but in the extracted field, Splunk will collapse the double backslash into a single backslash. My example message above would look like this: "The network path path is: \serverA\folder1\subfolder".

This only happens in the extracted field however. In the _raw view, all the extra slashes are retained, which isn't ideal for our purposes. Is there a way to have both the raw view and the extracted field match showing the intended number of slashes?

  1. If a particular field ends in a backslash, it causes the following separator pipe character to become escaped and included in the field instead of treated as a separator, throwing off all the delimited field mappings.

  2. However if a field has a pipe character in it's actual data we can use the backslash to escape it to prevent it from being treated as a delimiter. This is great. However, the problem here is that in the extracted field, splunk still shows us the backslash character as though it was part of the data.

For example, our _raw could be abc|def|12345 representing 2 fields, with values "abc|def" and "12345". But if we look at the extracted field value for field 1, we don't see "abc|def" we see "abc|def". The slash is used as an escape, but it is not removed from the result. This is inconsistent with what happens when the slash escapes another slash.

This behavior has us scratching our heads as to how to properly handle special characters. Any advice would be greatly appreciated.

Highlighted

Re: How to properly handle backslashes in data

SplunkTrust
SplunkTrust

When extracting the field you may choose to remove the backslash or not.

For your first example, it appears you've extracted everything AFTER the first slash such as this:

[sourcetypeName]
EXTRACT-uncPATH = \/(?<uncPATH>.+)

If you changed that to be this, it would extract both slashes

[sourcetypeName]
EXTRACT-uncPATH = (?<uncPATH>\/\/.+)

You can always add it back in your search:

| makeresults count=1 | eval uncPath="/servername/share/" | rex mode=sed field=uncPath "s/$\//\/\//g"

Or remove it:

| makeresults count=1 | eval fieldName="abcd\|def" | rex mode=sed field=fieldName"s/\\//g"

Of course in your case you will not use eval or makeresults.

Highlighted

Re: How to properly handle backslashes in data

Path Finder

Thanks for your suggestion, but I don't think this quite addresses our scenario. I think your suggestion is to edit the regex used to extract the field to either include or exclude the slashes.

Unfortunately, in this case, we are not using the regex extraction here. We are using a delimited extraction, with pipes as the delimiter. The field is just a text string with a log message, and sometimes we will output a UNC path in the log message. We want the UNC path to show up correctly in the resulting "message" field. And we don't necessarily want to extract a separate field just for these scenarios.

So we are looking for suggestions to make the delimited extraction extract properly with respect to these slashes.

0 Karma
Highlighted

Re: How to properly handle backslashes in data

Path Finder

Hi Richard,

Did you find a solution for this I am facing similar challenge with my data

0 Karma