Splunk Search

How to extract fields from XML data at search-time?

yostwal_synechr
New Member

I have a .log file that I need to analyse using Splunk. The structure of the log data is as below

<root>
    <ns0:LogMessage xmlns:ns0="http://some_namespace.com/schemas/logmessage/3.1.2">
        <ns0:Fields>
            <ns0:Field>
                <ns0:name>Action</ns0:name>
                <ns0:value>Start</ns0:value>
            </ns0:Field>
            <ns0:Field>
                <ns0:name>MessageTypeName</ns0:name>
                <ns0:value>Logging request for ABC service</ns0:value>
            </ns0:Field>
            <ns0:Field>
                <ns0:name>CreatedBy</ns0:name>
                <ns0:value>domain/user.name</ns0:value>
            </ns0:Field>
            .
            .
            .
            .
            <ns0:Field>
                <ns0:name>MessageID</ns0:name>
                <ns0:value>1234</ns0:value>
            </ns0:Field>
        </ns0:Fields>
    </ns0:LogMessage>
</root>

I want to extract the fields like Action and use them during my searches. Is there a why that I can search something like Action=Start MessageID=1234 ?

If yes, how can I achieve that? I went through other questions posted here related to XML, but I couldn't find something similar to the scenario that I have,

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

You could use the Interactive Field Extractor to do this

  1. Go to the event
  2. Click "Event Actions"
  3. Click "Extract Fields"
  4. Copy examples of the fields you want from your data into the examples box, like multiple actions or message ids.
  5. Test generated regex, edit as needed.
  6. Save as field extraction

See this guide (for 6.1).

Note this process becomes significantly easier in 6.2.

View solution in original post

landen99
Motivator

I like this solution using transforms.conf

[views_std]
MV_ADD = 1
REGEX = \<(\w+[^\n\/\>]+)\/?\>([^\<\n][^\<]*)\<
FORMAT = $1::$2
CLEAN_KEYS = true

[views_param]
MV_ADD = 1
REGEX = \<(\w+ [^\n\/\>]+)\/?\>
FORMAT = param::$1
CLEAN_KEYS = true

[views_option]
MV_ADD = 1
SOURCE_KEY = param
REGEX = (\w+(?: \w+)*)="(?!host|source|sourcetype|index|splunk_server)(\w+)"
FORMAT = $1::$2
CLEAN_KEYS = true
0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

You could use the Interactive Field Extractor to do this

  1. Go to the event
  2. Click "Event Actions"
  3. Click "Extract Fields"
  4. Copy examples of the fields you want from your data into the examples box, like multiple actions or message ids.
  5. Test generated regex, edit as needed.
  6. Save as field extraction

See this guide (for 6.1).

Note this process becomes significantly easier in 6.2.

yostwal_synechr
New Member

I did try field extraction (as in 6.2) but then it is not accurate. The regex I write dont work on the data I have.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

What you need to do then, is edit the regex that is generated so that it does work on the data you have.

Example image

You can also use the "require" as well as the "extract" in 6.2 field extraction.

If neither of these work, look into using the configuration files for search-time field extractions: Create and maintain search-time field extractions through configuration files

0 Karma

yostwal_synechr
New Member

Hi aljohnson_splunk,

Further to this, I also tried extraction using props.conf and transforms.conf as below:

In props.conf, I added the following statement under [default] (as I want this extraction for all the sources and sourcetypes):

REPORT-Action=Action

And, in transforms.conf I added the following statements:

[Action]
REGEX = ?<root>.?<ns0:LogMessage\s.>.?<ns0:Fields>.+<ns0:Field>.+<ns0:name>(Action)</ns0:name>.+<ns0:value>([^<]+)</ns0:value>.+
FORMAT = Action::$1

Am I doing something wrong here? I have a doubt on my regex. Please refer the example of the logs above.

0 Karma

patelmiral
New Member

I also have similar requirement, I have lot of xml webservice responses in logs. So for example lets say we have above xml in log, and first I want to search all those log stmts that has Logging request for ABC service and then I want to have all those xmls that matching that will key value of fields of xml , so I can perform other search based on that.

spath ? xmlkv ? xpath ?

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

patelmiral, you should ask a separate question, post more data, desired output, attempted searches, etc. and it will be easier for someone to help you with your question.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Have you tried using the xpath command?

0 Karma

yostwal_synechr
New Member

Yes I did. I tried both xpath and spath. But I am not getting the results as desired.

To give you an example of what I want, I want to look at events where the element ns0:name is Action and ns0:value is Start

I tried the following:

sourcetype=AppLogs.log | xpath outfield=Action "//root/ns0:LogMessage/ns0:Fields/ns0:Field[ns0:name=\"Action\" AND ns0:value=\"Start\"]"

This just returns all the events.

What I am doing, is that correct? Please correct me if not.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Have you tried using the xmlkv command?

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...