<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Python script to screen scrape a web page? in Installation</title>
    <link>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104734#M9555</link>
    <description>&lt;P&gt;I don't have one, but I would recommend if you're using Python, you use the Beautiful Soup HTML parsing library, which is specifically intended for this. The standard library's HtmlParser and htmllib are rather less robust:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow"&gt;http://www.crummy.com/software/BeautifulSoup/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The other side of this is that you need to fetch the HTML page using an HTTP library. For this, the standard Python httplib is fine.&lt;/P&gt;</description>
    <pubDate>Wed, 15 Dec 2010 05:42:05 GMT</pubDate>
    <dc:creator>gkanapathy</dc:creator>
    <dc:date>2010-12-15T05:42:05Z</dc:date>
    <item>
      <title>Python script to screen scrape a web page?</title>
      <link>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104733#M9554</link>
      <description>&lt;P&gt;I've been experimenting with lookup tables and I'd like to try using an external lookup command.  The goal is to extract data from a web page.  I've never done any programming in python and the little bit of research I've done is pretty daunting.  &lt;/P&gt;

&lt;P&gt;All the script needs to do is read the HTML of a web page where the link includes a field value from the event, such as &lt;A href="https://community.splunk.com/www.externalsite.com/$event_code" target="test_blank"&gt;www.externalsite.com/$event_code&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Can anyone point me to some python examples that will accomplish this?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2010 04:34:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104733#M9554</guid>
      <dc:creator>jambajuice</dc:creator>
      <dc:date>2010-12-15T04:34:59Z</dc:date>
    </item>
    <item>
      <title>Re: Python script to screen scrape a web page?</title>
      <link>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104734#M9555</link>
      <description>&lt;P&gt;I don't have one, but I would recommend if you're using Python, you use the Beautiful Soup HTML parsing library, which is specifically intended for this. The standard library's HtmlParser and htmllib are rather less robust:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow"&gt;http://www.crummy.com/software/BeautifulSoup/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The other side of this is that you need to fetch the HTML page using an HTTP library. For this, the standard Python httplib is fine.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2010 05:42:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104734#M9555</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-12-15T05:42:05Z</dc:date>
    </item>
    <item>
      <title>Re: Python script to screen scrape a web page?</title>
      <link>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104735#M9556</link>
      <description>&lt;P&gt;Beautiful Soup is most awesome&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2010 06:43:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Installation/Python-script-to-screen-scrape-a-web-page/m-p/104735#M9556</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2010-12-15T06:43:16Z</dc:date>
    </item>
  </channel>
</rss>

