Splunk Search
Highlighted

Do we have any best practices for field extraction command usage for XML inputs?

Path Finder

Reason for this specific question is to understand the performance quotient for each command like rex/xmlkv/spath/multikv. One evident experience is if I use xmlkv, it is taking a quite a long time to fetch relevant fields. Also, finding a challenge in using rex commands. How to improve the performance of splunk while still using xmlkv?

0 Karma
Highlighted

Re: Do we have any best practices for field extraction command usage for XML inputs?

Legend

Try using xpath or spath which are specifically for reading XML/JSON data in tree like structure. You can alternatively also use rex provided your XML schema is know to you so that you can define specific structure.

Example XML node:Field XMLData = <NodeName>Sample Data</NodeName>

1) Using rex command -> rex field=XMLData ("&lt;NodeName&gt;(?<MyNodeName>\w+)&lt;/NodeName&gt;")
Using spath command -> spath input=XMLData output=MyNodeName path=RequestType.NodeName

The spath command will extract multiple key value pairs if they exist in the path provided. However, for rex command in order to get multiple key-value pairs you should set max_match property (any number representing number of matches and 0 implies all matches. Default is 1 for single match).
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

2) If you are searching for element NodeName in your events, always make sure it is present in your base search filter i.e.
index=YourIndexName sourcetype=YourSourceType AND "<NodeName&gt" AND "</NodeName&gt"

3) It would be preferable to include XML Header fields also in base search queries which remain same for similar events you are planning to search like "<RequestType&gtAddProductToCart</RequestType&gt". This way you are only looking at AddProductToCart XMLs and ignoring all others if they are not required in your search.

4) Finally, if there are too many XMLs being written quite frequently you should try to extract fields generate stats and push them to Summary index on aggregated fields as key value pair for faster searches. Refer si<stats> command where si is for summary index and <stats> could be stats, chart, timechart etc. Also collect command which is more user controlled through Scheduled Searches.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect




| eval message="Happy Splunking!!!"


0 Karma