All Apps and Add-ons

Website Input: Another device to scrape information from

cmodyssey
Explorer

Hi,

I have a WiFi Central Heating and Hot Water controller and I have put the page source of what I want to extract information from as code at the bottom of this posting.

How difficult would it be to make Website Input extract the following information?:

The temperature to the right of 'Actual'
The temperature to the right of 'Set'
The ON/OFF status to the right of 'Heat Status'
The ON/OFF status to the right of 'Hot Water'

Thanks in advance.

Richard.

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>Heatmiser Wifi Thermostat</title>
<link href="/mywifi.css" rel="stylesheet" type="text/css" />
    <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
    <meta http-equiv="Content-Language" content="en-gb">
    <meta http-equiv="Cache-Control" content="no-cache, must-revalidate"> 
</head>

<script language="javascript">
function awayClick()
{   aw = document.FrmChg4;
    if(aw.actH.value=='1')
    {
        aw.actH.value=2;
    }
    else
    {
        aw.actH.value=1;
    }
    aw.submit();
}
function summerClick()
{   sm = document.fostfm;
    if(sm.fost.value=='1')
    {
        sm.fost.value=2;
    }
    else
    {
        sm.fost.value=1;
    }
    sm.submit();
}
function myrefsh()
{
    location.assign(location);  
}
function myonload()
{}
</script>
<body bgcolor="#B3120C" onload="myonload()" topmargin=0 leftmargin=0 rightmargin=0 style="overflow-x:hidden;overflow-y:auto">
        <form name="logfm">
        <input type=hidden name="lgst" value="1">
        <input type=hidden name="disw" value="none">
        <input type=hidden name="modl" value="4">
        <input type=hidden name="senor" value="0">
        <input type="hidden" name="hldy" value="2">
        </form>
<script language="javascript">
        if(document.logfm.lgst.value != "1") 
            top.location.href="index.htm";  
</script>
<form name="FrmChg4" method="post">
<input type="hidden" name="actH" value="2">
</form>
<form name="fostfm" method="post">
<input type="hidden" name="fost" value="2">
<form>
        <div id="lb7" ></div>
        <form name='dispFrm'>
        <table bgcolor="#B3120C" style='text-align:center;' border="0" width=100% height=100%  cellspacing=0 cellpadding=0 >
        <script language="javascript">
            var tmp, sntm, hltm;
            tmp = Number(document.logfm.modl.value);
            sntm = Number(document.logfm.senor.value);
            hltm = Number(document.logfm.hldy.value);
            if(tmp > 1)
                document.write("<tr><td align=center height=30 colspan=4 ><a href='/statSetup.htm' target='midfm' onfocus='this.blur()' ><b><font color='white' face='arial'>16:42  17/01/2016</font></b></a></td></tr>");
            else
                document.write("<tr><td align=center height=30 colspan=4 >&lt;br/&gt;</td></tr>");
            document.write("<tr><td height=30 bgcolor='black' colspan=4></td></tr><td class='p5' colspan=4 ><b>Live View </b><input type='button' value='Refresh' onclick='myrefsh()'></td></tr><tr><td colspan=4 >&lt;br/&gt;</td></tr>");
            if(tmp < 5)
            {   
                if(sntm > 1)
                    document.write("<tr><td class='p5'colspan=4><b>Floor : </b><font size='5'> <font face='Arial'>&deg;</font>C</font></td></tr>");
                if(sntm != 2)
                    document.write("<tr><td class='p5'colspan=4><b>Actual : </b><font size='5'>22.0 <font face='Arial'>&deg;</font>C</font></td></tr><tr><td class='p5'colspan=4><b>Set : </b><font size='4'>22 <font face='Arial'>&deg;</font>C</font></td></tr><tr><td colspan=4 >&lt;br/&gt;</td></tr><tr><td class='p5'colspan=4><b>Heat Status:</b><font size='4'>OFF <font face='Arial'></td></tr>");
            }
            if(tmp > 3)
            { 
                if(tmp < 5 )
                    document.write("<tr><td class='p5'colspan=4><b>Hot Water:</b><font size='4'>ON <font face='Arial'></td></tr>");
                else 
                    document.write("<tr><td class='p5'colspan=4><b>Timer :</b><font size='4'>ON <font face='Arial'></td></tr>");

                if(tmp == 6)
                    document.write("<tr><td colspan=4 >&lt;br/&gt;</td></tr><tr><td class='p5'colspan=4><b>Timer left : 55  h: 29 min</b></td></tr>");
            }

            if((tmp > 1)&&(tmp < 5))
                document.write("<tr><td colspan=4 >&lt;br/&gt;</td></tr><tr><td class='p5'colspan=4><b>Hold for: 0 h: 00 min</b></td></tr>");       
            document.write("<tr><td colspan=4 >&lt;br/&gt;</td></tr><tr><td class='p5' colspan=4><b>Occupancy</b></td></tr>");
            document.write("<tr><td class='p5'colspan=1 >     </td><td class='p5' style='text-decoration:underline;'colspan=1><b>Home</b></td><td class='p5' colspan=1><a onclick=awayClick() style='color:white;' href='#'><b>Away</b></a></td><td class='p5'colspan=1 >     </td></tr>");
            if(tmp == 4)
            {
                if(hltm!=1)
                {
                    document.write("<tr><tr><td colspan=4 >&lt;br/&gt;</td></tr><tr><td class='p5' colspan=4><b>Summer</b></td></tr>");
                    document.write("<tr><td class='p5'colspan=1 >     </td><td class='p5' colspan=1><a onclick=summerClick() style='color:white;' href='#'><b>On</b></a></td><td class='p5'  style='text-decoration:underline;' colspan=1><b>Off</b></td><td class='p5'colspan=1 >     </td></tr>");
                }
            }
        </script>
        </table>
        </form>
    </body>
</html>
0 Karma
1 Solution

LukeMurphey
Champion

The Website Input app is indeed for this type of use case. However, the app won't be able to extract the information from that web-page in its current form because the page is generated in part by Javascript (the "document.write" function calls). This won't work correctly since the app has no way to execute Javascript.

The good news is that I think I can develop a workaround.

I have had the concept of adding the ability to import raw content so that you can do the extractions yourself in Splunk. It is more work since you will have to use rex to make it work but it is the only way short of supporting Javascript in the input (which is possible but hard).

I'm doing the work under this ticket: http://lukemurphey.net/issues/1168

Update:

I added the ability to include the raw content in version 2.1. Additionally, version 3.0 is going to include the ability to render the content in an actual browser (I have it working in Firefox). This would allow you to match against the content after Javascript has executed. See this ticket for the web-browser rendering feature: http://lukemurphey.net/issues/1323

View solution in original post

0 Karma

LukeMurphey
Champion

The Website Input app is indeed for this type of use case. However, the app won't be able to extract the information from that web-page in its current form because the page is generated in part by Javascript (the "document.write" function calls). This won't work correctly since the app has no way to execute Javascript.

The good news is that I think I can develop a workaround.

I have had the concept of adding the ability to import raw content so that you can do the extractions yourself in Splunk. It is more work since you will have to use rex to make it work but it is the only way short of supporting Javascript in the input (which is possible but hard).

I'm doing the work under this ticket: http://lukemurphey.net/issues/1168

Update:

I added the ability to include the raw content in version 2.1. Additionally, version 3.0 is going to include the ability to render the content in an actual browser (I have it working in Firefox). This would allow you to match against the content after Javascript has executed. See this ticket for the web-browser rendering feature: http://lukemurphey.net/issues/1323

0 Karma

cmodyssey
Explorer

Hi,

Thanks for taking a look at this.

I like the sound of making Website Input work, so that the data can be regex'd.

I'll keep an eye on this thread and issue 1168.

Thanks again,

Richard.

0 Karma

dcarmack_splunk
Splunk Employee
Splunk Employee

Hi Richard

There is an app on Splunk Base called website input built for this purpose.

https://splunkbase.splunk.com/app/1818/

I use for a similar purpose and it works great.

cmodyssey
Explorer

Hi,

Thanks for replying, I am wanting to use website input for this.

Where I am stuck is how I can get it to extract the data I need.

Does that make sense?

Thanks,

Richard.

0 Karma

dcarmack_splunk
Splunk Employee
Splunk Employee

Ah, gotcha! The app uses css selectors for grabbing values from the HTML. Looking at your example, your best bet would be to use:

.p5 font

as your selector. That identifies the p5 class and the child font element.

document.write("<tr><td class='p5'colspan=4><b>Hot Water:</b><font size='4'>ON <font face='Arial'></td></tr>");

That select will capture "ON" in the example above.

0 Karma

cmodyssey
Explorer

Hi,

I have tried that with Website Input and also with CSS Selector test page http://try.jsoup.org/

It does not give me the values I am looking for.

When I put my whole page into the Input of the Jsoup CSS selector tester, it does not work.

If I just put the below into Jsoup CSS selector tester, it does work.

 document.write("<tr><td class='p5'colspan=4><b>Hot Water:</b><font size='4'>ON <font face='Arial'></td></tr>");

Any ideas?

Thanks,

Richard.

0 Karma

dcarmack_splunk
Splunk Employee
Splunk Employee

https://validator.w3.org/check

When I checked the validity of the HTML posted above, the validator found 148 errors. I suggest correcting them before trying again.

0 Karma

cmodyssey
Explorer

Hi,

That page comes directly from my central heating controller, so I don't have any opportunities to edit it to remove the errors.

I guess this leaves me at a dead end?

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...

Platform Highlights | January 2023 Newsletter

 January 2023Peace on Earth and Peace of Mind With Business ResilienceAll organizations can start the new year ...