Our application is using CXF interceptors to log XML SOAP requests and responses. The format of the log entries is:
2014-06-24 07:35:03,597 INFO com.foo.bar.Test WebContainer : 5 - Inbound Message
---------------------------
ID: 7232
Response-Code: 200
Encoding: UTF-8
Content-Type: text/xml
Headers: {$WSCS=[RC4-MD5], $WSIS=[true], ...
Payload: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><MyXmlMessage>....</MyXmlMessage></soap:Body></soap:Envelope>
--------------------------------------
Is there any way to have Splunk (whether through configuration, a search query, etc.) extract the XML payload part of the log entry? We'd like to be able to run queries against the XML to look for specific element values.
In case it makes a difference, the SOAP payload log entries are intermixed with other application-specific log entries.
Once you've extracted the XML string into the Payload
field you could do this in a search:
... | spath input=Payload
That'll look at the content of Payload
and extract all fields it can find. If you're looking for a specific value only you can add an xpath-style selector as well, see http://docs.splunk.com/Documentation/Splunk/6.1.1/SearchReference/spath for reference.
Once you've extracted the XML string into the Payload
field you could do this in a search:
... | spath input=Payload
That'll look at the content of Payload
and extract all fields it can find. If you're looking for a specific value only you can add an xpath-style selector as well, see http://docs.splunk.com/Documentation/Splunk/6.1.1/SearchReference/spath for reference.
Works, thanks!
That'll create a field called soap:Envelope.soap:Body.MyXmlMessage
with the value ....
, just as you'd expect.
Splunk will eat that. Look at this:
| stats count | eval _raw = "2014-06-24 07:35:03,597 INFO com.foo.bar.Test WebContainer : 5 - Inbound Message
---------------------------
ID: 7232
Response-Code: 200
Encoding: UTF-8
Content-Type: text/xml
Headers: {$WSCS=[RC4-MD5], $WSIS=[true], ...
Payload: <soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><MyXmlMessage>....</MyXmlMessage></soap:Body></soap:Envelope>
--------------------------------------"
| rex "(?s)Payload: (?<payload>.*)\s+-{30,}" | spath input=payload
The format for a single log entry is as you see above (starts with date and ends with a dashed line) and contains line breaks, even between XML elements.
In addition, there are XML namespaces and prefixes in some of the XML elements such as soap:Body -- when I tried substituting the SOAP body with
That shouldn't be a problem. Add a field extraction with this expression:
Payload: (?<Payload>[^\n\r]+)
That's assuming your XML has no line breaks. To test you can use inline rex
like this:
... | rex "Payload: (?<Payload>[^\n\r]+)" | spath input=Payload
Thanks...just need to find out how to get the Payload field extracted first.