<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to index HTML files? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120315#M24999</link>
    <description>&lt;P&gt;If I understand correctly, you want to strip HTML tags from an input file. I do not believe that any built-in sourcetype or extraction is going to handle this. I would approach this by pre-processing the logs. Many scripting languages have facilities to strip tags out of streams/files. It is probably something you &lt;EM&gt;could&lt;/EM&gt; do with a SEDCMD or regex transform, but that may not be the best way to go.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
Sean&lt;/P&gt;</description>
    <pubDate>Wed, 25 Jun 2014 21:20:51 GMT</pubDate>
    <dc:creator>chanfoli</dc:creator>
    <dc:date>2014-06-25T21:20:51Z</dc:date>
    <item>
      <title>How to index HTML files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120314#M24998</link>
      <description>&lt;P&gt;I have source log files with HTML formatting. After indexing I get  12000 lines in one record.&lt;BR /&gt;
I need to remove HTML mark-ups to index every particular line in the log file.&lt;BR /&gt;
Can I do it in Splunk? &lt;BR /&gt;
What source type should I use?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2014 18:55:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120314#M24998</guid>
      <dc:creator>sergeyvinnik</dc:creator>
      <dc:date>2014-06-25T18:55:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to index HTML files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120315#M24999</link>
      <description>&lt;P&gt;If I understand correctly, you want to strip HTML tags from an input file. I do not believe that any built-in sourcetype or extraction is going to handle this. I would approach this by pre-processing the logs. Many scripting languages have facilities to strip tags out of streams/files. It is probably something you &lt;EM&gt;could&lt;/EM&gt; do with a SEDCMD or regex transform, but that may not be the best way to go.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
Sean&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2014 21:20:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120315#M24999</guid>
      <dc:creator>chanfoli</dc:creator>
      <dc:date>2014-06-25T21:20:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to index HTML files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120316#M25000</link>
      <description>&lt;P&gt;Can I use TRANSFORM with following REGEX? &lt;/P&gt;

&lt;P&gt;/&amp;lt;[a-zA-Z_/=]*&amp;gt;/ /g&lt;/P&gt;

&lt;P&gt;It should replace all tags like   ..  by spaces&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2014 22:59:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120316#M25000</guid>
      <dc:creator>sergeyvinnik</dc:creator>
      <dc:date>2014-06-25T22:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to index HTML files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120317#M25001</link>
      <description>&lt;P&gt;Remember tho that this transformation will have to occur everytime you run the query. It would be much more efficient to create script to pre-process and just do it once&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jun 2014 18:48:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120317#M25001</guid>
      <dc:creator>edschembor</dc:creator>
      <dc:date>2014-06-26T18:48:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to index HTML files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120318#M25002</link>
      <description>&lt;P&gt;There is an app called &lt;A href="http://apps.splunk.com/app/1818/"&gt;Website Input&lt;/A&gt; that was designed to pull information from websites. That might handle your case if the HTML files are accessible via an HTTP server.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jul 2014 06:06:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-index-HTML-files/m-p/120318#M25002</guid>
      <dc:creator>LukeMurphey</dc:creator>
      <dc:date>2014-07-10T06:06:48Z</dc:date>
    </item>
  </channel>
</rss>

