Dashboards & Visualizations

Using WGET to monitor html webpage?

bofasplunkguy
Explorer

I'm struggling to monitor a webpage with Splunk. I just need to get the html from a certain URL (like google,com) and get the contents of the element into Splunk. According to the following link, a script using WGET would be the best solution:

https://answers.splunk.com/answers/23638/how-to-monitor-http-web-pages-and-splunk-the-results.html

Can someone please explain how to write a script that executes a WGET? Javascript and powershell both do not support WGET, so I'm not sure what language this person has in mind. Open to other options too - any help to point me in the right direction would be appreciated!

0 Karma

avishni01
Explorer

hi
you can try to use this Splunk app (or use it as a reference )
https://splunkbase.splunk.com/app/1818/#/overview

0 Karma

bofasplunkguy
Explorer

Hi, thanks for your response. Have you actually been able to get this to work? I have been trying to set it up and no matter what I do, it returns a table with timed_out as True. Here are examples of searches after setup:

| webscrape selector="body" url="http://textcrticial.net" depth_limit=25 empty_matches=0
| webscrape selector="body" url="https://google.com" depth_limit=25 empty_matches=0

Can you let me know how you set it up?

0 Karma

avishni01
Explorer

it is working fine for a simple HTML page i have created that dont have any CSS

<!DOCTYPE html>
<html>
<body>

hello this is a sample page content

</body>
</html>

in this case - *in the selector settings of the Website Input you should put **

then to extract only the body of the html i have used the REX command

index=web* sourcetype="web test" | rex field=content "(?msi)<Body>(?<testdata>.*)<\/Body>\s+"
0 Karma

avishni01
Explorer

i have used it for a while in a test environment. it is working ok but it takes some time to find the right selector.
please note that the selector is based on the page CSS classes. in the search in your comment the selector is body which seem to be the html part. try finding the css class that define the text you are trying to get from the site.

0 Karma

bofasplunkguy
Explorer

Got it, I was thinking css/html classes were the same. The page I am using is very bare-bones. There is no CSS at all. You're saying you can only monitor classes that have CSS applied to them?

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...