Dashboards & Visualizations

Using WGET to monitor html webpage?

bofasplunkguy
Explorer

I'm struggling to monitor a webpage with Splunk. I just need to get the html from a certain URL (like google,com) and get the contents of the element into Splunk. According to the following link, a script using WGET would be the best solution:

https://answers.splunk.com/answers/23638/how-to-monitor-http-web-pages-and-splunk-the-results.html

Can someone please explain how to write a script that executes a WGET? Javascript and powershell both do not support WGET, so I'm not sure what language this person has in mind. Open to other options too - any help to point me in the right direction would be appreciated!

0 Karma

avishni01
Explorer

hi
you can try to use this Splunk app (or use it as a reference )
https://splunkbase.splunk.com/app/1818/#/overview

0 Karma

bofasplunkguy
Explorer

Hi, thanks for your response. Have you actually been able to get this to work? I have been trying to set it up and no matter what I do, it returns a table with timed_out as True. Here are examples of searches after setup:

| webscrape selector="body" url="http://textcrticial.net" depth_limit=25 empty_matches=0
| webscrape selector="body" url="https://google.com" depth_limit=25 empty_matches=0

Can you let me know how you set it up?

0 Karma

avishni01
Explorer

it is working fine for a simple HTML page i have created that dont have any CSS

<!DOCTYPE html>
<html>
<body>

hello this is a sample page content

</body>
</html>

in this case - *in the selector settings of the Website Input you should put **

then to extract only the body of the html i have used the REX command

index=web* sourcetype="web test" | rex field=content "(?msi)<Body>(?<testdata>.*)<\/Body>\s+"
0 Karma

avishni01
Explorer

i have used it for a while in a test environment. it is working ok but it takes some time to find the right selector.
please note that the selector is based on the page CSS classes. in the search in your comment the selector is body which seem to be the html part. try finding the css class that define the text you are trying to get from the site.

0 Karma

bofasplunkguy
Explorer

Got it, I was thinking css/html classes were the same. The page I am using is very bare-bones. There is no CSS at all. You're saying you can only monitor classes that have CSS applied to them?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...