Splunk Search

Automatically extract xml key value pairs?

Contributor

I'm using SPLUNK to index an xml file. Is there a way to have SPLUNK automatically extract the key-value pairs for each event (xmlkv) for every search. I don't want the user to have to type the | xmlkv in the search bar each time. I see in props.conf you can set the KV_MODE, but none of the settings indicate xml extraction.

Tags (1)
1 Solution

Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

Esteemed Legend

try this:

LINE_BREAKER = ([\r\n]{2})

Contributor

hey gkanapathy 🙂

I used your mad skillz regex in my transforms.conf but it negates the line breaker in my props.conf 😞

Any ideas on how to ensure the line breaker still works in this example?

props.conf:

[nagiosstatus]
MAX_EVENTS = 500000
TIME_PREFIX = \<created\>
MAX_TIMESTAMP_LOOKAHEAD = 500
SHOULD_LINEMERGE = false
LINE_BREAKER = (\n\n)
REPORT-xmlkv = xmlkv-alternative

transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

sample xml log:

<nagios>

<info>
    <created>1299121157</created>
    <version>3.2.1</version>
    <last_update_check>1299108670</last_update_check>
    <update_available>1</update_available>
    <last_version>3.2.1</last_version>
    <new_version>3.2.3</new_version>
</info>

<programstatus>
    <modified_host_attributes>1</modified_host_attributes>
    <modified_service_attributes>1</modified_service_attributes>
    <nagios_pid>15961</nagios_pid>
    <daemon_mode>1</daemon_mode>
    <program_start>1299103468</program_start>
    <last_command_check>1299121108</last_command_check>
    <last_log_rotation>0</last_log_rotation>
    <enable_notifications>1</enable_notifications>
    <active_service_checks_enabled>1</active_service_checks_enabled>
    <passive_service_checks_enabled>1</passive_service_checks_enabled>
    <active_host_checks_enabled>1</active_host_checks_enabled>
    <passive_host_checks_enabled>1</passive_host_checks_enabled>
    <enable_event_handlers>1</enable_event_handlers>
    <obsess_over_services>0</obsess_over_services>
    <obsess_over_hosts>0</obsess_over_hosts>
    <check_service_freshness>1</check_service_freshness>
    <check_host_freshness>0</check_host_freshness>
    <enable_flap_detection>0</enable_flap_detection>
    <enable_failure_prediction>1</enable_failure_prediction>
    <process_performance_data>1</process_performance_data>
    <global_host_event_handler></global_host_event_handler>
    <global_service_event_handler></global_service_event_handler>
    <next_comment_id>94586</next_comment_id>
    <next_downtime_id>35813</next_downtime_id>
    <next_event_id>1185528</next_event_id>
    <next_problem_id>532761</next_problem_id>
    <next_notification_id>1337020</next_notification_id>
    <total_external_command_buffer_slots>4096</total_external_command_buffer_slots>
    <used_external_command_buffer_slots>11</used_external_command_buffer_slots>
    <high_external_command_buffer_slots>128</high_external_command_buffer_slots>
    <active_scheduled_host_check_stats>21,132,401</active_scheduled_host_check_stats>
    <active_ondemand_host_check_stats>33,278,834</active_ondemand_host_check_stats>
    <passive_host_check_stats>0,0,0</passive_host_check_stats>
</programstatus>

</nagios>

Thanks in advance,

Luke 🙂

0 Karma

Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

Splunk Employee
Splunk Employee

As of version 4.3, you can now use the setting in props.conf:

KV_MODE = xml

which will perform spath extraction.

0 Karma

Builder

Very Nice 🙂

0 Karma

Contributor

Worked perfectly! Thanks!

0 Karma

Super Champion

Nice trick. You could also add MV_ADD = True to your xmlkv-alternative stanza if you want to capture repeating XML elements as a multi-value field, for example if your XML represents a list of items. This is something that you can't do with the default xmlkv command. Pretty cool.