Splunk Search

Hidden Characters in a .csv datasource

raysonjoberts
Path Finder

I am having a random issue where it seems characters are present in a field which cannot be seen.
If you look in the results below, even though the results appear to match each other, Splunk does see these as 2 distinct values. 
If I download and open the results, one of the two names has characters in it that are not seen when looking at the results in the Search App. If I open the file in my text editor, one of the two names is in quotes, if I open the file in Excel, one of the two names is preceded by ‚Äã.

It feels like a problem with the underlying  lookup files (.csv),  however this problem is not consistent, only a very small percentage of results has this incorrect format (<.005%).  Trying to use regex or replace to remove non-alphanumeric values in a field does not seem to work, I am at a loss with it.  Any idea how to remove "non-visible" characters or correct this formatting?
 Screenshot 2024-03-04 at 10.49.34 AM.png

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Without knowing what the characters actually are, I can suggest this eval logic that may help you clean up the data

| eval tmpVM=split(VM, "")
| eval newVM=mvjoin(mvmap(tmpVM, if(tmpVM>=" " AND tmpVM<="z", tmpVM, null())), "")

which will break the string up into the individual characters and then the mvmap will check that each character is between space and lower case z (which will cover most of the printable ASCII chars) and join it back together again.

If the goal is to fix up the csv, then this should work and you can rewrite the csv, but if this is a general problem with the CSV being written regularly, then you should try to see if you can understand the data that's getting in.

It sounds like it could be an encoding issue and there may be some spurious UTF (8 or 16) characters in there.

Those "‚Äã" character are all valid characters but they would show up as such in Splunk, so Excel is just doing its best.

You could also add in x=len(VM) to see how many additional characters are there, but you will also see the tmpVM variable in the above eval snippet shows you what Splunk thinks of the data.

0 Karma

raysonjoberts
Path Finder

Thank you @bowesmana . I got sick of beating my head against a wall and put in a workaround of sorts. The .csv in question came from an outputlookup I ran against some indexed data. For some reason, I could not filter out non-alphanumeric characters from the .csv itself, but I could with the indexed data.. So I filtered it out with a rex statement, then re-ran my outputlookup to create a new .csv.

Thank you for taking the time to reply!

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at ...

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...