Splunk Search

How to extract fields from XML data at search-time?

New Member

I have a .log file that I need to analyse using Splunk. The structure of the log data is as below

<root>
    <ns0:LogMessage xmlns:ns0="http://some_namespace.com/schemas/logmessage/3.1.2">
        <ns0:Fields>
            <ns0:Field>
                <ns0:name>Action</ns0:name>
                <ns0:value>Start</ns0:value>
            </ns0:Field>
            <ns0:Field>
                <ns0:name>MessageTypeName</ns0:name>
                <ns0:value>Logging request for ABC service</ns0:value>
            </ns0:Field>
            <ns0:Field>
                <ns0:name>CreatedBy</ns0:name>
                <ns0:value>domain/user.name</ns0:value>
            </ns0:Field>
            .
            .
            .
            .
            <ns0:Field>
                <ns0:name>MessageID</ns0:name>
                <ns0:value>1234</ns0:value>
            </ns0:Field>
        </ns0:Fields>
    </ns0:LogMessage>
</root>

I want to extract the fields like Action and use them during my searches. Is there a why that I can search something like Action=Start MessageID=1234 ?

If yes, how can I achieve that? I went through other questions posted here related to XML, but I couldn't find something similar to the scenario that I have,

0 Karma
1 Solution

Splunk Employee
Splunk Employee

You could use the Interactive Field Extractor to do this

  1. Go to the event
  2. Click "Event Actions"
  3. Click "Extract Fields"
  4. Copy examples of the fields you want from your data into the examples box, like multiple actions or message ids.
  5. Test generated regex, edit as needed.
  6. Save as field extraction

See this guide (for 6.1).

Note this process becomes significantly easier in 6.2.

View solution in original post

Motivator

I like this solution using transforms.conf

[views_std]
MV_ADD = 1
REGEX = \<(\w+[^\n\/\>]+)\/?\>([^\<\n][^\<]*)\<
FORMAT = $1::$2
CLEAN_KEYS = true

[views_param]
MV_ADD = 1
REGEX = \<(\w+ [^\n\/\>]+)\/?\>
FORMAT = param::$1
CLEAN_KEYS = true

[views_option]
MV_ADD = 1
SOURCE_KEY = param
REGEX = (\w+(?: \w+)*)="(?!host|source|sourcetype|index|splunk_server)(\w+)"
FORMAT = $1::$2
CLEAN_KEYS = true
0 Karma

Splunk Employee
Splunk Employee

You could use the Interactive Field Extractor to do this

  1. Go to the event
  2. Click "Event Actions"
  3. Click "Extract Fields"
  4. Copy examples of the fields you want from your data into the examples box, like multiple actions or message ids.
  5. Test generated regex, edit as needed.
  6. Save as field extraction

See this guide (for 6.1).

Note this process becomes significantly easier in 6.2.

View solution in original post

New Member

I did try field extraction (as in 6.2) but then it is not accurate. The regex I write dont work on the data I have.

0 Karma

Splunk Employee
Splunk Employee

What you need to do then, is edit the regex that is generated so that it does work on the data you have.

Example image

You can also use the "require" as well as the "extract" in 6.2 field extraction.

If neither of these work, look into using the configuration files for search-time field extractions: Create and maintain search-time field extractions through configuration files

0 Karma

New Member

Hi aljohnson_splunk,

Further to this, I also tried extraction using props.conf and transforms.conf as below:

In props.conf, I added the following statement under [default] (as I want this extraction for all the sources and sourcetypes):

REPORT-Action=Action

And, in transforms.conf I added the following statements:

[Action]
REGEX = ?<root>.?<ns0:LogMessage\s.>.?<ns0:Fields>.+<ns0:Field>.+<ns0:name>(Action)</ns0:name>.+<ns0:value>([^<]+)</ns0:value>.+
FORMAT = Action::$1

Am I doing something wrong here? I have a doubt on my regex. Please refer the example of the logs above.

0 Karma

New Member

I also have similar requirement, I have lot of xml webservice responses in logs. So for example lets say we have above xml in log, and first I want to search all those log stmts that has Logging request for ABC service and then I want to have all those xmls that matching that will key value of fields of xml , so I can perform other search based on that.

spath ? xmlkv ? xpath ?

0 Karma

Splunk Employee
Splunk Employee

patelmiral, you should ask a separate question, post more data, desired output, attempted searches, etc. and it will be easier for someone to help you with your question.

0 Karma

Splunk Employee
Splunk Employee

Have you tried using the xpath command?

0 Karma

New Member

Yes I did. I tried both xpath and spath. But I am not getting the results as desired.

To give you an example of what I want, I want to look at events where the element ns0:name is Action and ns0:value is Start

I tried the following:

sourcetype=AppLogs.log | xpath outfield=Action "//root/ns0:LogMessage/ns0:Fields/ns0:Field[ns0:name=\"Action\" AND ns0:value=\"Start\"]"

This just returns all the events.

What I am doing, is that correct? Please correct me if not.

0 Karma

Splunk Employee
Splunk Employee

Have you tried using the xmlkv command?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!