Dashboards & Visualizations

Transform XML log data before indexing - How can I get this to work?

sc0tt
Builder

I have an semi-formatted XML log that is currently being indexed by Splunk with no problems. However, it indexes a lot of additional data that is not needed. In order to save on space I would like to transform this log and only index what is needed. I'm using Anonymize data as an example. Below is a sample of the log data and the desired formatted version and my current configuration.

log data

2013-10-23 12:22:17,286 INFO  ==== Outgoing ==== xml version="1.0" encoding="UTF-8"?> <INTERFACE_API><UserId>55555555555</UserId><MsgType>Response</MsgType><Key>001</Key><SessionID>1000</SessionID></INTERFACE_API><EOM>
2013-10-23 12:22:17,274 INFO  ==== Incoming ==== <INTERFACE_API><UserId>55555555555</UserId><MsgType>Request</MsgType><Item>5</Item><Internal>INTERNAL_VALUE</Internal><SessionID>999999999999</SessionID></INTERFACE_API>

desired indexed data

2013-10-23 12:22:17,286 UserId=55555555555 Key=001
2013-10-23 12:22:17,274 UserId=55555555555 Item=5 SessionID=999999999999

props.conf

[test_io]
NO_BINARY_CHECK = 1
pulldown_type = 1
TRANSFORMS-xml = test-io-xml

transforms.conf

[test-io-xml]
REGEX = <(.*?)(?:\s[^>]*)?>([^<]*)</\\1>
FORMAT = $1=$2
DEST_KEY = _raw

I see no difference in my indexed data. I used the regex that is included in the xmlkv.py script because that does work in a search, so I figured that it is able to extract the xml values correctly.

How can I get this to work?

update:
I've also tried modifying my props.conf to use SEDCMD. I added a script that will remove all tags to see if I'm able to alter the indexed data but there is still no effect.

[test_io]
NO_BINARY_CHECK = 1
pulldown_type = 1
SEDCMD-testio = s/<[^<>]\{1,\}>//g

update 2:
The above sed script doesn't appear to work but I was able to get another test script to work. Now that I know the props.conf is working correctly, is there a way to remove the xml tags and create a key=value pair with sed?

0 Karma
1 Solution

sc0tt
Builder

I was able to use a sed script like below to extract and format the fields I wanted.

SEDCMD-testio = s/(.*) INFO.*<UserId>(.*)<\/UserId>.*<Item>(.*)<\/Item>.*<SessionID>(.*)<\/SessionID>.*/\1 UserId=\2 Item=\3 SessionID=\4/

The indexed data then is formatted as

2013-10-23 12:22:17,274 UserId=55555555555 Item=5 SessionID=999999999999

View solution in original post

sc0tt
Builder

I was able to use a sed script like below to extract and format the fields I wanted.

SEDCMD-testio = s/(.*) INFO.*<UserId>(.*)<\/UserId>.*<Item>(.*)<\/Item>.*<SessionID>(.*)<\/SessionID>.*/\1 UserId=\2 Item=\3 SessionID=\4/

The indexed data then is formatted as

2013-10-23 12:22:17,274 UserId=55555555555 Item=5 SessionID=999999999999
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...