Splunk Search

Hidden Characters in a .csv datasource

raysonjoberts
Path Finder

I am having a random issue where it seems characters are present in a field which cannot be seen.
If you look in the results below, even though the results appear to match each other, Splunk does see these as 2 distinct values. 
If I download and open the results, one of the two names has characters in it that are not seen when looking at the results in the Search App. If I open the file in my text editor, one of the two names is in quotes, if I open the file in Excel, one of the two names is preceded by ‚Äã.

It feels like a problem with the underlying  lookup files (.csv),  however this problem is not consistent, only a very small percentage of results has this incorrect format (<.005%).  Trying to use regex or replace to remove non-alphanumeric values in a field does not seem to work, I am at a loss with it.  Any idea how to remove "non-visible" characters or correct this formatting?
 Screenshot 2024-03-04 at 10.49.34 AM.png

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Without knowing what the characters actually are, I can suggest this eval logic that may help you clean up the data

| eval tmpVM=split(VM, "")
| eval newVM=mvjoin(mvmap(tmpVM, if(tmpVM>=" " AND tmpVM<="z", tmpVM, null())), "")

which will break the string up into the individual characters and then the mvmap will check that each character is between space and lower case z (which will cover most of the printable ASCII chars) and join it back together again.

If the goal is to fix up the csv, then this should work and you can rewrite the csv, but if this is a general problem with the CSV being written regularly, then you should try to see if you can understand the data that's getting in.

It sounds like it could be an encoding issue and there may be some spurious UTF (8 or 16) characters in there.

Those "‚Äã" character are all valid characters but they would show up as such in Splunk, so Excel is just doing its best.

You could also add in x=len(VM) to see how many additional characters are there, but you will also see the tmpVM variable in the above eval snippet shows you what Splunk thinks of the data.

0 Karma

raysonjoberts
Path Finder

Thank you @bowesmana . I got sick of beating my head against a wall and put in a workaround of sorts. The .csv in question came from an outputlookup I ran against some indexed data. For some reason, I could not filter out non-alphanumeric characters from the .csv itself, but I could with the indexed data.. So I filtered it out with a rex statement, then re-ran my outputlookup to create a new .csv.

Thank you for taking the time to reply!

0 Karma
Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...