Splunk Search

Automatically extract xml key value pairs?

Contributor

I'm using SPLUNK to index an xml file. Is there a way to have SPLUNK automatically extract the key-value pairs for each event (xmlkv) for every search. I don't want the user to have to type the | xmlkv in the search bar each time. I see in props.conf you can set the KV_MODE, but none of the settings indicate xml extraction.

Tags (1)
1 Solution

Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

Esteemed Legend

try this:

LINE_BREAKER = ([\r\n]{2})

Contributor

hey gkanapathy 🙂

I used your mad skillz regex in my transforms.conf but it negates the line breaker in my props.conf 😞

Any ideas on how to ensure the line breaker still works in this example?

props.conf:

[nagiosstatus]
MAX_EVENTS = 500000
TIME_PREFIX = \<created\>
MAX_TIMESTAMP_LOOKAHEAD = 500
SHOULD_LINEMERGE = false
LINE_BREAKER = (\n\n)
REPORT-xmlkv = xmlkv-alternative

transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

sample xml log:

<nagios>

<info>
    <created>1299121157</created>
    <version>3.2.1</version>
    <last_update_check>1299108670</last_update_check>
    <update_available>1</update_available>
    <last_version>3.2.1</last_version>
    <new_version>3.2.3</new_version>
</info>

<programstatus>
    <modified_host_attributes>1</modified_host_attributes>
    <modified_service_attributes>1</modified_service_attributes>
    <nagios_pid>15961</nagios_pid>
    <daemon_mode>1</daemon_mode>
    <program_start>1299103468</program_start>
    <last_command_check>1299121108</last_command_check>
    <last_log_rotation>0</last_log_rotation>
    <enable_notifications>1</enable_notifications>
    <active_service_checks_enabled>1</active_service_checks_enabled>
    <passive_service_checks_enabled>1</passive_service_checks_enabled>
    <active_host_checks_enabled>1</active_host_checks_enabled>
    <passive_host_checks_enabled>1</passive_host_checks_enabled>
    <enable_event_handlers>1</enable_event_handlers>
    <obsess_over_services>0</obsess_over_services>
    <obsess_over_hosts>0</obsess_over_hosts>
    <check_service_freshness>1</check_service_freshness>
    <check_host_freshness>0</check_host_freshness>
    <enable_flap_detection>0</enable_flap_detection>
    <enable_failure_prediction>1</enable_failure_prediction>
    <process_performance_data>1</process_performance_data>
    <global_host_event_handler></global_host_event_handler>
    <global_service_event_handler></global_service_event_handler>
    <next_comment_id>94586</next_comment_id>
    <next_downtime_id>35813</next_downtime_id>
    <next_event_id>1185528</next_event_id>
    <next_problem_id>532761</next_problem_id>
    <next_notification_id>1337020</next_notification_id>
    <total_external_command_buffer_slots>4096</total_external_command_buffer_slots>
    <used_external_command_buffer_slots>11</used_external_command_buffer_slots>
    <high_external_command_buffer_slots>128</high_external_command_buffer_slots>
    <active_scheduled_host_check_stats>21,132,401</active_scheduled_host_check_stats>
    <active_ondemand_host_check_stats>33,278,834</active_ondemand_host_check_stats>
    <passive_host_check_stats>0,0,0</passive_host_check_stats>
</programstatus>

</nagios>

Thanks in advance,

Luke 🙂

0 Karma

Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

Splunk Employee
Splunk Employee

As of version 4.3, you can now use the setting in props.conf:

KV_MODE = xml

which will perform spath extraction.

0 Karma

Builder

Very Nice 🙂

0 Karma

Contributor

Worked perfectly! Thanks!

0 Karma

Super Champion

Nice trick. You could also add MV_ADD = True to your xmlkv-alternative stanza if you want to capture repeating XML elements as a multi-value field, for example if your XML represents a list of items. This is something that you can't do with the default xmlkv command. Pretty cool.

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!