Splunk Search

Hidden Characters in a .csv datasource

raysonjoberts
Path Finder

I am having a random issue where it seems characters are present in a field which cannot be seen.
If you look in the results below, even though the results appear to match each other, Splunk does see these as 2 distinct values. 
If I download and open the results, one of the two names has characters in it that are not seen when looking at the results in the Search App. If I open the file in my text editor, one of the two names is in quotes, if I open the file in Excel, one of the two names is preceded by ‚Äã.

It feels like a problem with the underlying  lookup files (.csv),  however this problem is not consistent, only a very small percentage of results has this incorrect format (<.005%).  Trying to use regex or replace to remove non-alphanumeric values in a field does not seem to work, I am at a loss with it.  Any idea how to remove "non-visible" characters or correct this formatting?
 Screenshot 2024-03-04 at 10.49.34 AM.png

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Without knowing what the characters actually are, I can suggest this eval logic that may help you clean up the data

| eval tmpVM=split(VM, "")
| eval newVM=mvjoin(mvmap(tmpVM, if(tmpVM>=" " AND tmpVM<="z", tmpVM, null())), "")

which will break the string up into the individual characters and then the mvmap will check that each character is between space and lower case z (which will cover most of the printable ASCII chars) and join it back together again.

If the goal is to fix up the csv, then this should work and you can rewrite the csv, but if this is a general problem with the CSV being written regularly, then you should try to see if you can understand the data that's getting in.

It sounds like it could be an encoding issue and there may be some spurious UTF (8 or 16) characters in there.

Those "‚Äã" character are all valid characters but they would show up as such in Splunk, so Excel is just doing its best.

You could also add in x=len(VM) to see how many additional characters are there, but you will also see the tmpVM variable in the above eval snippet shows you what Splunk thinks of the data.

0 Karma

raysonjoberts
Path Finder

Thank you @bowesmana . I got sick of beating my head against a wall and put in a workaround of sorts. The .csv in question came from an outputlookup I ran against some indexed data. For some reason, I could not filter out non-alphanumeric characters from the .csv itself, but I could with the indexed data.. So I filtered it out with a rex statement, then re-ran my outputlookup to create a new .csv.

Thank you for taking the time to reply!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

    Thursday, June 25, 2026  |  11AM PDT / 2PM EDT  Duration: 1 Hour (Includes live Q&A) Register to ...

Analytics Workspace deprecation

As of Splunk Cloud Platform 10.4.2604 and Splunk Enterprise 10.4, Analytics Workspace is now deprecated. ...

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Splunk Developer Day brought the Splunk developer community together for a practical look at what it means to ...