Using WGET to monitor html webpage?

bofasplunkguy · ‎08-29-2019

I'm struggling to monitor a webpage with Splunk. I just need to get the html from a certain URL (like google,com) and get the contents of the element into Splunk. According to the following link, a script using WGET would be the best solution:

https://answers.splunk.com/answers/23638/how-to-monitor-http-web-pages-and-splunk-the-results.html

Can someone please explain how to write a script that executes a WGET? Javascript and powershell both do not support WGET, so I'm not sure what language this person has in mind. Open to other options too - any help to point me in the right direction would be appreciated!

avishni01 · ‎08-29-2019

hi
you can try to use this Splunk app (or use it as a reference )
https://splunkbase.splunk.com/app/1818/#/overview

bofasplunkguy · ‎08-29-2019

Hi, thanks for your response. Have you actually been able to get this to work? I have been trying to set it up and no matter what I do, it returns a table with timed_out as True. Here are examples of searches after setup:

| webscrape selector="body" url="http://textcrticial.net" depth_limit=25 empty_matches=0
| webscrape selector="body" url="https://google.com" depth_limit=25 empty_matches=0

Can you let me know how you set it up?

avishni01 · ‎08-30-2019

it is working fine for a simple HTML page i have created that dont have any CSS

<!DOCTYPE html>
<html>
<body>

hello this is a sample page content

</body>
</html>

in this case - *in the selector settings of the Website Input you should put **

then to extract only the body of the html i have used the REX command

index=web* sourcetype="web test" | rex field=content "(?msi)<Body>(?<testdata>.*)<\/Body>\s+"

avishni01 · ‎08-29-2019

i have used it for a while in a test environment. it is working ok but it takes some time to find the right selector.
please note that the selector is based on the page CSS classes. in the search in your comment the selector is body which seem to be the html part. try finding the css class that define the text you are trying to get from the site.

bofasplunkguy · ‎08-29-2019

Got it, I was thinking css/html classes were the same. The page I am using is very bare-bones. There is no CSS at all. You're saying you can only monitor classes that have CSS applied to them?

Using WGET to monitor html webpage?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Using WGET to monitor html webpage?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits