<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Trim whitespace in indexed files in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82754#M17163</link>
    <description>&lt;P&gt;As said previously, SEDCMD is the way to go. Something like this in props.conf on the indexer:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[sourcetype]
SEDCMD-repws = s/\s+/ /g
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This will match on one or more whitespace characters and replace it with one space.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Apr 2011 18:30:25 GMT</pubDate>
    <dc:creator>bojanz</dc:creator>
    <dc:date>2011-04-15T18:30:25Z</dc:date>
    <item>
      <title>Trim whitespace in indexed files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82751#M17160</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;We are indexing a substantial number of XML files. These files have between 30% and 50% of white space that can be trimmed with no side effects on the real content of the file.&lt;/P&gt;

&lt;P&gt;I was wondering wether it was possible to filter these files for removing white space (really simple regex to apply), before indexing. Can this be done on the UniversalForwarder? On the indexer?&lt;/P&gt;

&lt;P&gt;Our aim is reducing the amount of daily indexed data as you can imagine...&lt;/P&gt;

&lt;P&gt;Many thanks&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2011 16:20:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82751#M17160</guid>
      <dc:creator>oscargarcia</dc:creator>
      <dc:date>2011-04-15T16:20:23Z</dc:date>
    </item>
    <item>
      <title>Re: Trim whitespace in indexed files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82752#M17161</link>
      <description>&lt;P&gt;You should be able to do this with a &lt;CODE&gt;SEDCMD&lt;/CODE&gt;.  (But the regex might get complicated).  See the docs at &lt;A href="http://www.splunk.com/base/Documentation/4.2/Data/Anonymizedatawithsed"&gt;http://www.splunk.com/base/Documentation/4.2/Data/Anonymizedatawithsed&lt;/A&gt; for info on how to configure this.&lt;/P&gt;

&lt;P&gt;If you are using Universal or Light forwarder, the &lt;CODE&gt;SEDCMD&lt;/CODE&gt; needs to be configured at the indexer.  Your whitespace will cross the wire, but will be filtered at the indexer before it writes to the index. &lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2011 16:34:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82752#M17161</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2011-04-15T16:34:08Z</dc:date>
    </item>
    <item>
      <title>Re: Trim whitespace in indexed files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82753#M17162</link>
      <description>&lt;P&gt;You can use the SEDCMD configuration in props.conf to replace whitespace.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.splunk.com/base/Documentation/4.2/Data/Anonymizedatawithsed"&gt;http://www.splunk.com/base/Documentation/4.2/Data/Anonymizedatawithsed&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2011 16:35:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82753#M17162</guid>
      <dc:creator>Stephen_Sorkin</dc:creator>
      <dc:date>2011-04-15T16:35:28Z</dc:date>
    </item>
    <item>
      <title>Re: Trim whitespace in indexed files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82754#M17163</link>
      <description>&lt;P&gt;As said previously, SEDCMD is the way to go. Something like this in props.conf on the indexer:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[sourcetype]
SEDCMD-repws = s/\s+/ /g
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This will match on one or more whitespace characters and replace it with one space.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2011 18:30:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82754#M17163</guid>
      <dc:creator>bojanz</dc:creator>
      <dc:date>2011-04-15T18:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Trim whitespace in indexed files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82755#M17164</link>
      <description>&lt;P&gt;Although, you &lt;EM&gt;might&lt;/EM&gt; want something like: &lt;CODE&gt;s/(\s)\s*/\1/g&lt;/CODE&gt; which is more likely to help preserve a line break. (While stripping off indents at the start of a line.)&lt;/P&gt;</description>
      <pubDate>Sat, 16 Apr 2011 03:36:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Trim-whitespace-in-indexed-files/m-p/82755#M17164</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2011-04-16T03:36:31Z</dc:date>
    </item>
  </channel>
</rss>

