Hi,
I have the below example XML to scrape:
    <COOK>
       <COOK_NAME>Cook</COOK_NAME>
       <COOK_TEMP>738</COOK_TEMP>
       <COOK_SET>3560</COOK_SET>
       <COOK_STATUS>0</COOK_STATUS>
    </COOK>
    <FOOD1>
       <FOOD1_NAME>Food1</FOOD1_NAME>
       <FOOD1_TEMP>OPEN</FOOD1_TEMP>
       <FOOD1_SET>1800</FOOD1_SET>
       <FOOD1_STATUS>4</FOOD1_STATUS>
    </FOOD1>
    <FOOD2>
       <FOOD2_NAME>Food2</FOOD2_NAME>
       <FOOD2_TEMP>OPEN</FOOD2_TEMP>
       <FOOD2_SET>1800</FOOD2_SET>
       <FOOD2_STATUS>4</FOOD2_STATUS>
    </FOOD2>
    <FOOD3>
       <FOOD3_NAME>Food3</FOOD3_NAME>
       <FOOD3_TEMP>OPEN</FOOD3_TEMP>
       <FOOD3_SET>1800</FOOD3_SET>
       <FOOD3_STATUS>4</FOOD3_STATUS>
    </FOOD3>
    <OUTPUT_PERCENT>100</OUTPUT_PERCENT>
I am extracting the data I want with the following CSS Selector:
cook_temp,cook_set,food1_temp,food1_set,output_percent
This results in the following events:
response_size="1235" match_2="3560" match_1="725" match="725" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="577.230930328" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="567.966938019" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="565.255880356" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="722" match="722" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="576.737880707" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="572.259187698" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="569.040060043" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
I would like to change each match_(number) to be the name of the HTML element (COOK_TEMP, COOK_SET etc.) . I can tell that by setting the "Name Attributes" would not help me, as that's trailered to setting based on HTML attributes, not HTML elements.
Is there a way to configure this to use HTML elements?
If not, is there some editing I can do to /opt/splunk/etc/apps/website_input/bin/web_input.py to do this, as I don't mind having some "non-standard" Website Input code on my system and don't know Python that well.
Thanks in advance,
Richard.
I think I can support this in the app natively. My main concern when writing this app was to support HTML but I like the ability the handle XML too.
I opened a ticket to look into and am considering several options: http://lukemurphey.net/issues/1145.
Update
Version 1.2 now has the ability to use the tag names as the field names. Just check the "Use Tag Name as Field Name". This version isn't the default yet; you will have to manually select it. Let me know if it works for you.
Hi,
Thanks for looking into this.
I have just finished modifying /opt/splunk/etc/apps/website_input/bin/web_input.py to have it to include element names in the field names.
Here's the details of what I've done.
Change:
                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))
To:
                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))
                printable_match = "%s" % (match)
                re_result = re.search('Element (.*) at', printable_match)
                element = re_result.group(1)
Change:
                    if not field_made:
                        if output_matches_as_mv:
                            result['match'].append(match_text)
To:
                if not field_made:
                    if output_matches_as_mv:
                        #result['match'].append(match_text)
                        result['match_' + element] = match_text
My coding is not to a high standard, as it's my first time working on Python.
I wanted to believe in the Splunk statement of it being able to take any data in from any source and have got there after a lot of trial an error and debug log lines!
Cheers,
RIchard.
I think I can support this in the app natively. My main concern when writing this app was to support HTML but I like the ability the handle XML too.
I opened a ticket to look into and am considering several options: http://lukemurphey.net/issues/1145.
Update
Version 1.2 now has the ability to use the tag names as the field names. Just check the "Use Tag Name as Field Name". This version isn't the default yet; you will have to manually select it. Let me know if it works for you.
FYI: I have a solution for this that I am testing now.
Hi,
Thanks for all your work on version 1.2.
I have upgraded to that version this morning and it works perfectly 🙂
It's really good to get the data I need from the work that you've done to web_input.py, rather than mine.
Thanks again,
Richard.
Hi Luke,
Thanks for sticking with this, I will be good to get an official solution tho this, instead of the modifications that I have done.
Look forward to the new version 🙂
Richard.
Did that work for you? You can accept the answer to let me know it worked too.
Hi,
I had not seen your update about version 1.2, so I'm glad you commented, as it made me aware of it, thanks.
I will upgrade to that version to try it and update back on here on how I got on.
Thanks again,
Richard.
