Installation

Python script to screen scrape a web page?

jambajuice
Communicator

I've been experimenting with lookup tables and I'd like to try using an external lookup command. The goal is to extract data from a web page. I've never done any programming in python and the little bit of research I've done is pretty daunting.

All the script needs to do is read the HTML of a web page where the link includes a field value from the event, such as www.externalsite.com/$event_code

Can anyone point me to some python examples that will accomplish this?

Thanks.

Tags (1)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I don't have one, but I would recommend if you're using Python, you use the Beautiful Soup HTML parsing library, which is specifically intended for this. The standard library's HtmlParser and htmllib are rather less robust:

http://www.crummy.com/software/BeautifulSoup/

The other side of this is that you need to fetch the HTML page using an HTTP library. For this, the standard Python httplib is fine.

dwaddle
SplunkTrust
SplunkTrust

Beautiful Soup is most awesome

0 Karma
Get Updates on the Splunk Community!

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Heading to your first .Conf? You’re in for an unforgettable ride — learning, networking, swag collecting, ...

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Calling all Splunk developers, custom SPL builders, dashboarders, and Splunkbase app creators – the Builder ...

Hunt Smarter, Not Harder: Discover New SPL “Recipes” in Our Threat Hunting Webinar

Are you ready to take your threat hunting skills to the next level? As Splunk community members, you know the ...