Dashboards & Visualizations

Extract fields from log4net files with XML layout

Communicator

Hi,

We are using log4net to generate application logs. Events are logged to files as XML elements.
A simple, single element looks like this:

Logfile:

<log4net:event logger="MyLogger" timestamp="2015-02-15T06:07:08.1234567+01:00" level="INFO" thread="1" domain="MyProgram.exe" username="MYDOMAIN\User1">
  <log4net:message>Message 1.</log4net:message>
  <log4net:properties>
    <!-- Note: The number of log4net:data elements varies from event to event.
               The value and name attributes may contain escaped XML characters (like '&lt;' instead of '<') -->
    <log4net:data name="MyPropery1" value="MyValue1" />
    <log4net:data name="MyPropery2" value="A&lt;B" />
    <log4net:data name="MyPropery3" value="MyValue3" />
  </log4net:properties>
  <log4net:exception>System.InvalidOperationException: Operation is not valid due to the current state of the object.
   at krdo.Tools.LogGenerator.LogGeneratorView.GetException() in C:\MyFolder\MyFile.cs:line 456</log4net:exception>
  <log4net:locationInfo class="MyNamespace.MyClass" method="MyMethod" file="C:\MyFolder\MyFile.cs" line="123" />
</log4net:event>

Now I use the following stanzas in props.conf and transforms.conf to split the file into events and extract fields:

props.conf:

[log4net XML Layout]
category = MyCategory
description = MyDescription
pulldown_type = 1

BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = <log4net:event([\r\n]|\s)
MAX_EVENTS = 1000
TIME_PREFIX = ([\r\n]|\s)timestamp="
MAX_TIMESTAMP_LOOKAHEAD = 64

KV_MODE = xml

# This is a workaround.
REPORT-log4net_properties    = log4net_properties

FIELDALIAS-log4net_logger    = log4net:event{@logger}                      AS logger
FIELDALIAS-log4net_timestamp = log4net:event{@timestamp}                   AS timestamp
FIELDALIAS-log4net_level     = log4net:event{@level}                       AS level
FIELDALIAS-log4net_thread    = log4net:event{@thread}                      AS thread
FIELDALIAS-log4net_domain    = log4net:event{@domain}                      AS domain
FIELDALIAS-log4net_username  = log4net:event{@username}                    AS username
FIELDALIAS-log4net_exception = log4net:event.log4net:exception             AS exception
FIELDALIAS-log4net_message   = log4net:event.log4net:message               AS message
FIELDALIAS-log4net_class     = log4net:event.log4net:locationInfo{@class}  AS class
FIELDALIAS-log4net_method    = log4net:event.log4net:locationInfo{@method} AS method
FIELDALIAS-log4net_file      = log4net:event.log4net:locationInfo{@file}   AS file
FIELDALIAS-log4net_cline     = log4net:event.log4net:locationInfo{@line}   AS line

transforms.conf:

[log4net_properties]
# The value (and field name) are not unescaped.
REGEX = <log4net:data(?:[\r\n]|\s)+name="(?<_KEY_1>[^"]*)"(?:[\r\n]|\s)+value="(?<_VAL_1>[^"]*)"(?:[\r\n]|\s)+/>
MV_ADD = true

Events and timestamps are detected correctly and the value of fields for which a FIELDALIAS- exists are correctly unescaped.

The Problem
Each event contains an element with a varying number of child elements, each representing a key value pair the name, order, number and value of those pairs is not known and varies from event to event. In the example above, three value pairs are specified:

MyPropery1 = "Value1"
MyPropery2 = "A<B" # Note: &lt; represents the '<' character in XML.
MyPropery3 = "Value3"

Those KVPs are not automatically extracted when using KV_MODE = xml, but are extracted as two multi-value fields log4net:event.log4net:properties.log4net:data{@name} and log4net:event.log4net:properties.log4net:data{@value}.
Is there a way to extract those KVPs? I would need to take the key form log4net:event.log4net:properties.log4net:data{@name} and assign it the value from log4net:event.log4net:properties.log4net:data{@value} with the corresponding index.

The current workaround is to extract those KVPs using the [log4net_properties] stanza in transforms.conf. The drawback of this approach is that I can't get the XML unescaped values - "A&lt;B" is extracted as "A&lt;B" and not as "A<B".

Also, there may be a lot of other escaped characters, not only "<".

Communicator

I've created a custom search command xpath2 based on the xpath command which supports XML namespaces and also allows me to extract multiple fields. I'm able to extract all the required fields, properly unescaped.

But now I'm facing another problem: How to invoke my custom search command from props.conf or transforms.conf?

0 Karma

Splunk Employee
Splunk Employee

I've gotten around similar issues by using the eval spath function. Effectively, play around with spath (as an eval function) and then you can save it as a calculated field. You can even do a field extraction for a subset of your XML if you want to (with max_matches if you've got multiple stanzas with the same name!), and then specify that field into your spath.

As an aside, you could also do an eval with urldecode as a calculated field to get rid of the XML formatting even with your current approach. I would add in to your props.conf:

EVAL-MyProperty1 = urldecode(MyProperty1)
EVAL-MyProperty2 = urldecode(MyProperty2)
EVAL-MyProperty3 = urldecode(MyProperty3)
0 Karma

Communicator

I tried using spath and was able to extract the names and values (each as multivalued fields). But I don't know how to tell Splunk to use the names as field names and assign the values (the names vary from event to event, as does the number of KVPs).

The encoding for URLs and XML is different so urldecode doesn't work here either (just try * | eval x=urldecode("&lt;") | table x).

0 Karma

Splunk Employee
Splunk Employee

So you can always tell spath to traverse an entire tree. If you're dealing with a small set, you could run spath on every different possibility -- the ones that are null just won't show up. At that point though, a field extraction might just be easier (using a bunch of eval replaces to hit each html encoded character). A field extraction also allows you the flexibility to extract both the field name and the field value, which can be very useful for something like this.

I think that if that's not enough to go on, it would probably be best to post an example or two of what you're trying to extract -- the visual helps a lot in trying to work past these types of issues.

0 Karma

Communicator

Seems like spath supports only access via index and not via predicates; I need a xpath expression like this: //log4net:event//log4net:properties/log4net:data[@name='MyPropery2']/@value to get the correct value. But the xpath command does not work with namespaces according to http://answers.splunk.com/answers/100530/search-time-xpath-command-namespace-handling.html
. Even if it would work I'd need to specify each field extraction by hand but the value of the name attribute (and therefore the name of the field which should be extracted) varies from event to event.

0 Karma

Splunk Employee
Splunk Employee

Ah, good call. You can instead use rex mode=sed to manually change them -- not quite as fun and easy but it works well. rex mode=sed field=xyz "s/&gt;/\>/g" etc.

Can can do the same thing in eval with replace, which should allow you flexibility enough when putting that in your conf files (evals are run after field extractions and transforms, I believe).

0 Karma

Communicator

Unescaping XML is a little more complicated than a simple replace; CDATA sections and multiple possible ways of encoding special characters result in a pretty long and complicated regex; I'd definitely prefer a reliable way provided by splunk (which AFAIK does not exist at the moment).

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!