All Apps and Add-ons
Highlighted

Website Input: Set field names to be HTML element, instead of attribute

Explorer

Hi,

I have the below example XML to scrape:

    <COOK>
       <COOK_NAME>Cook</COOK_NAME>
       <COOK_TEMP>738</COOK_TEMP>
       <COOK_SET>3560</COOK_SET>
       <COOK_STATUS>0</COOK_STATUS>
    </COOK>
    <FOOD1>
       <FOOD1_NAME>Food1</FOOD1_NAME>
       <FOOD1_TEMP>OPEN</FOOD1_TEMP>
       <FOOD1_SET>1800</FOOD1_SET>
       <FOOD1_STATUS>4</FOOD1_STATUS>
    </FOOD1>
    <FOOD2>
       <FOOD2_NAME>Food2</FOOD2_NAME>
       <FOOD2_TEMP>OPEN</FOOD2_TEMP>
       <FOOD2_SET>1800</FOOD2_SET>
       <FOOD2_STATUS>4</FOOD2_STATUS>
    </FOOD2>
    <FOOD3>
       <FOOD3_NAME>Food3</FOOD3_NAME>
       <FOOD3_TEMP>OPEN</FOOD3_TEMP>
       <FOOD3_SET>1800</FOOD3_SET>
       <FOOD3_STATUS>4</FOOD3_STATUS>
    </FOOD3>
    <OUTPUT_PERCENT>100</OUTPUT_PERCENT>

I am extracting the data I want with the following CSS Selector:

cook_temp,cook_set,food1_temp,food1_set,output_percent

This results in the following events:

response_size="1235" match_2="3560" match_1="725" match="725" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="577.230930328" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="567.966938019" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="565.255880356" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="722" match="722" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="576.737880707" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="572.259187698" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="569.040060043" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"

I would like to change each match(number) to be the name of the HTML element (COOKTEMP, COOK_SET etc.) . I can tell that by setting the "Name Attributes" would not help me, as that's trailered to setting based on HTML attributes, not HTML elements.

Is there a way to configure this to use HTML elements?

If not, is there some editing I can do to /opt/splunk/etc/apps/websiteinput/bin/webinput.py to do this, as I don't mind having some "non-standard" Website Input code on my system and don't know Python that well.

Thanks in advance,

Richard.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Champion

I think I can support this in the app natively. My main concern when writing this app was to support HTML but I like the ability the handle XML too.

I opened a ticket to look into and am considering several options: http://lukemurphey.net/issues/1145.

Update

Version 1.2 now has the ability to use the tag names as the field names. Just check the "Use Tag Name as Field Name". This version isn't the default yet; you will have to manually select it. Let me know if it works for you.

View solution in original post

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Champion

FYI: I have a solution for this that I am testing now.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Explorer

Hi Luke,

Thanks for sticking with this, I will be good to get an official solution tho this, instead of the modifications that I have done.

Look forward to the new version 🙂

Richard.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Champion

Did that work for you? You can accept the answer to let me know it worked too.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Explorer

Hi,

I had not seen your update about version 1.2, so I'm glad you commented, as it made me aware of it, thanks.

I will upgrade to that version to try it and update back on here on how I got on.

Thanks again,

Richard.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Explorer

Hi,

Thanks for all your work on version 1.2.

I have upgraded to that version this morning and it works perfectly 🙂

It's really good to get the data I need from the work that you've done to web_input.py, rather than mine.

Thanks again,

Richard.

0 Karma
Highlighted

Re: Website Input: Set field names to be HTML element, instead of attribute

Explorer

Hi,

Thanks for looking into this.

I have just finished modifying /opt/splunk/etc/apps/websiteinput/bin/webinput.py to have it to include element names in the field names.

Here's the details of what I've done.

Change:

                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))

To:

                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))

                printable_match = "%s" % (match)
                re_result = re.search('Element (.*) at', printable_match)
                element = re_result.group(1)

Change:

                    if not field_made:
                        if output_matches_as_mv:
                            result['match'].append(match_text)

To:

                if not field_made:
                    if output_matches_as_mv:
                        #result['match'].append(match_text)
                        result['match_' + element] = match_text

My coding is not to a high standard, as it's my first time working on Python.

I wanted to believe in the Splunk statement of it being able to take any data in from any source and have got there after a lot of trial an error and debug log lines!

Cheers,

RIchard.

0 Karma