<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hidden Characters in a .csv datasource in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679503#M232304</link>
    <description>&lt;P&gt;I am having a random issue where it seems characters are present in a field which cannot be seen.&lt;BR /&gt;If you look in the results below, even though the results appear to match each other, Splunk does see these as 2 distinct values.&amp;nbsp;&lt;BR /&gt;If I download and open the results, one of the two names has characters in it that are not seen when looking at the results in the Search App. If I open the file in my text editor, one of the two names is in quotes, if I open the file in Excel, one of the two names is preceded by&amp;nbsp;‚Äã.&lt;BR /&gt;&lt;BR /&gt;It feels like a problem with the underlying&amp;nbsp; lookup files (.csv),&amp;nbsp; however this problem is not consistent, only a very small percentage of results has this incorrect format (&amp;lt;.005%).&amp;nbsp; Trying to use regex or replace to remove non-alphanumeric values in a field does not seem to work, I am at a loss with it.&amp;nbsp; Any idea how to remove "non-visible" characters or correct this formatting?&lt;BR /&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-03-04 at 10.49.34 AM.png" style="width: 329px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/29587iB7709FBC493EE8EA/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-03-04 at 10.49.34 AM.png" alt="Screenshot 2024-03-04 at 10.49.34 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 04 Mar 2024 18:11:26 GMT</pubDate>
    <dc:creator>raysonjoberts</dc:creator>
    <dc:date>2024-03-04T18:11:26Z</dc:date>
    <item>
      <title>Hidden Characters in a .csv datasource</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679503#M232304</link>
      <description>&lt;P&gt;I am having a random issue where it seems characters are present in a field which cannot be seen.&lt;BR /&gt;If you look in the results below, even though the results appear to match each other, Splunk does see these as 2 distinct values.&amp;nbsp;&lt;BR /&gt;If I download and open the results, one of the two names has characters in it that are not seen when looking at the results in the Search App. If I open the file in my text editor, one of the two names is in quotes, if I open the file in Excel, one of the two names is preceded by&amp;nbsp;‚Äã.&lt;BR /&gt;&lt;BR /&gt;It feels like a problem with the underlying&amp;nbsp; lookup files (.csv),&amp;nbsp; however this problem is not consistent, only a very small percentage of results has this incorrect format (&amp;lt;.005%).&amp;nbsp; Trying to use regex or replace to remove non-alphanumeric values in a field does not seem to work, I am at a loss with it.&amp;nbsp; Any idea how to remove "non-visible" characters or correct this formatting?&lt;BR /&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-03-04 at 10.49.34 AM.png" style="width: 329px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/29587iB7709FBC493EE8EA/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-03-04 at 10.49.34 AM.png" alt="Screenshot 2024-03-04 at 10.49.34 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2024 18:11:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679503#M232304</guid>
      <dc:creator>raysonjoberts</dc:creator>
      <dc:date>2024-03-04T18:11:26Z</dc:date>
    </item>
    <item>
      <title>Re: Hidden Characters in a .csv datasource</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679530#M232308</link>
      <description>&lt;P&gt;Without knowing what the characters actually are, I can suggest this eval logic that may help you clean up the data&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| eval tmpVM=split(VM, "")
| eval newVM=mvjoin(mvmap(tmpVM, if(tmpVM&amp;gt;=" " AND tmpVM&amp;lt;="z", tmpVM, null())), "")&lt;/LI-CODE&gt;&lt;P&gt;which will break the string up into the individual characters and then the mvmap will check that each character is between space and lower case z (which will cover most of the printable ASCII chars) and join it back together again.&lt;/P&gt;&lt;P&gt;If the goal is to fix up the csv, then this should work and you can rewrite the csv, but if this is a general problem with the CSV being written regularly, then you should try to see if you can understand the data that's getting in.&lt;/P&gt;&lt;P&gt;It sounds like it could be an encoding issue and there may be some spurious UTF (8 or 16) characters in there.&lt;/P&gt;&lt;P&gt;Those "&lt;SPAN&gt;‚Äã" character are all valid characters but they would show up as such in Splunk, so Excel is just doing its best.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You could also add in x=len(VM) to see how many additional characters are there, but you will also see the tmpVM variable in the above eval snippet shows you what Splunk thinks of the data.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2024 22:53:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679530#M232308</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2024-03-04T22:53:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hidden Characters in a .csv datasource</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679535#M232310</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/6367"&gt;@bowesmana&lt;/a&gt;&amp;nbsp;. I got sick of beating my head against a wall and put in a workaround of sorts. The .csv in question came from an outputlookup I ran against some indexed data. For some reason, I could not filter out non-alphanumeric characters from the .csv itself, but I could with the indexed data.. So I filtered it out with a rex statement, then re-ran my outputlookup to create a new .csv.&lt;BR /&gt;&lt;BR /&gt;Thank you for taking the time to reply!&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2024 00:15:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Hidden-Characters-in-a-csv-datasource/m-p/679535#M232310</guid>
      <dc:creator>raysonjoberts</dc:creator>
      <dc:date>2024-03-05T00:15:26Z</dc:date>
    </item>
  </channel>
</rss>

