All Apps and Add-ons

Help needed about scraping real time data using Website Input

kashifqau
Explorer

Hello Everyone,

Its my first time dealing with website input and getting web data into splunk.

I have to download real time data from http://wsprnet.org/olddb and am using "Website Input" app for this purpose, however I am having 2 questions

  1. The web page is having several filters on top based upon which the table at downside gets populated with data. Can we adjust the data filters value using website input ?

  2. Website input provides me with a minimum interval of 1second, where the data is continuously being populated. Also when I tried to create a css selector, it selects the whole table and creates a single event in Splunk consisting of all the data rows in that table e.g. if the table is displaying 50 rows, it creates a single event in Splunk which contains data of all 50 rows.

I want the css selector which creates a separate event for each data row in table and also I want to capture all the data.

Thank you for your cooperation and help

Tags (1)
0 Karma
1 Solution

LukeMurphey
Champion

1. The web page is having several filters on top based upon which the table at downside gets populated with data. Can we adjust the data filters value using website input ?
The only way to make this work is to see if you can set arguments in the URL that will set the filters for you. You may want to try changing the filters and seeing the URL is updated automatically to correspond to your changes. If it does, then you ought to be able to copy the URL after you have adjusted the filters and use that as the URL to do the web-scraping from.

2. Website input provides me with a minimum interval of 1 second, where the data is continuously being populated.
Currently, 1 second is indeed the minimum.

Also when I tried to create a css selector, it selects the whole table and creates a single event in Splunk consisting of all the data rows in that table e.g. if the table is displaying 50 rows, it creates a single event in Splunk which contains data of all 50 rows.
Try clicking the individual cell or row to see if that gets you the results you are looking for.

If not, you may want to manually construct the selector and enter it into the text-box to see if it matches what you want. The user-interface for selecting items struggles to support some cases and thus some manual editing of the selector may be needed to get the data in the format that you want.

If you want to do it by row, you will want to use a selector based on the TR element (for example a selector of "tr").

If you want to do it by each cell, you will want to use a selector based on the TD element (for example a selector of "td").

View solution in original post

0 Karma

LukeMurphey
Champion

1. The web page is having several filters on top based upon which the table at downside gets populated with data. Can we adjust the data filters value using website input ?
The only way to make this work is to see if you can set arguments in the URL that will set the filters for you. You may want to try changing the filters and seeing the URL is updated automatically to correspond to your changes. If it does, then you ought to be able to copy the URL after you have adjusted the filters and use that as the URL to do the web-scraping from.

2. Website input provides me with a minimum interval of 1 second, where the data is continuously being populated.
Currently, 1 second is indeed the minimum.

Also when I tried to create a css selector, it selects the whole table and creates a single event in Splunk consisting of all the data rows in that table e.g. if the table is displaying 50 rows, it creates a single event in Splunk which contains data of all 50 rows.
Try clicking the individual cell or row to see if that gets you the results you are looking for.

If not, you may want to manually construct the selector and enter it into the text-box to see if it matches what you want. The user-interface for selecting items struggles to support some cases and thus some manual editing of the selector may be needed to get the data in the format that you want.

If you want to do it by row, you will want to use a selector based on the TR element (for example a selector of "tr").

If you want to do it by each cell, you will want to use a selector based on the TD element (for example a selector of "td").

0 Karma

kashifqau
Explorer

Thank you Luke for your reply and explanation.

Regarding point 1, I verified that the URL gets changed by changing filter values, hence this problem has been resolved.

Point 2 is what, I will have to live with 🙂

Point 3 is still unresolved. but Its not related to this application, rather it is related to building an appropriate css selector, so I will post this issue to some css selector forum.

Thank you again for your support and cooperation

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...