Splunk Search

XML data - multi value field extraction without using xpath

meno
Path Finder

I got stuck with extracting a multi value field from XML data:

<Results>
    <Result>
        <Grade>Error</Grade>
        <MachinesFound>0</MachinesFound>
        <Machines>
        </Machines>
    </Result>
    <Result>
        <Grade>Critical</Grade>
        <MachinesFound>3</MachinesFound>
        <Machines>
            <Machine path="some data">BIZ\TOTO</Machine>
            <Machine path="some data">BIZ\TATA</Machine>
            <Machine path="some data">BIZ\TUTU</Machine>
        </Machines>
    </Result>
    <Result>
        <Grade>Warning</Grade>
        <MachinesFound>2</MachinesFound>
        <Machines>
            <Machine path="some data">BIZ\TCTC</Machine>
            <Machine path="some data">BIZ\TZTZ</Machine>
        </Machines>
    </Result>
    <Result>
        <Grade>Passed</Grade>
        <MachinesFound>1</MachinesFound>
        <Machines>
            <Machine path="some data">BIZ\TETE</Machine>
        </Machines>
    </Result>
    <Result>
        <Grade>NotPerformed</Grade>
        <MachinesFound>0</MachinesFound>
        <Machines>
        </Machines>
    </Result>
</Results>

I am already extracting Grade and the concerning amount of MachinesFound:

[gradeextr]
REGEX = \<Grade\>([^\<]+)\</Grade\>.*?\<MachinesFound\>(\d+)\</MachinesFound\>
FORMAT= $1::$2

What is missing now are the concerning Machine names.

xpath is easy...

 *| xpath "//Results/Result[Grade=\"Critical\"]/Machines/Machine" outfield=critbox

...but unfortunately not an option.

So I want to go via props/transforms.conf:

REGEX = \<Grade\>Critical\</Grade\>.+?\<Machines\>.+\<Machine path="[^\>]*"\>([^\<]+)\</Machine\>.+
FORMAT = critbox::$1

When I try to extract multi values with...

MV_ADD = true

...the critbox field has 2 similar(!) values.

Is it possible to extract a multi value field without xpath using props/transforms.conf ?

I tried also with using fields.conf and TOKENIZER= without any success.

Thanks for any new ideas...

Tags (2)
2 Solutions

araitz
Splunk Employee
Splunk Employee

From your explanation, I can't see a reason that you must extract using that one regex. This works better:

[regex1]
REGEX = \<Grade\>([^\<]+)\</Grade\>
FORMAT= grade::$1

[regex2]
REGEX = <MachinesFound\>(\d+)\</MachinesFound\>
FORMAT= machine_count::$1

[regex3]
REGEX = \<Machine path="[^\>]*"\>([^\<]+)
FORMAT= machine_name::$1
MV_ADD= true

Then:

... | stats sum(machine_count) as total_count values(machine_name) as machine_names by grade

View solution in original post

ziegfried
Influencer

Here you go:

props.conf

[<your_sourcetype>]
REPORT-extract-crit=extract_crit

transforms.conf

[extract_crit]
REGEX = (?ms)\<Grade\>Critical\</Grade\>.+?\<Machines\>\s*(.+?)\s*\</Machines\>
FORMAT = critbox::$1

fields.conf

[critbox]
TOKENIZER=\<Machine.+?\>([^\<]+)

It would be possible without the TOKENIZER if the PCRE implementation would support variable-length look-behinds with an expression like this:

[test_crit]
REGEX = (?ms)(?<=\<Grade\>Critical\</Grade\>.+?\<Machines\>.+?(?\<Machine path="[^\>]*"\>[^\<]+\</Machine\>\s+)?)\<Machine path="[^\>]*"\>([^\<]+)\</Machine\>
FORMAT = critbox::$1
MV_ADD=true

I think support for this has been added to python 2.7 (Splunk's using 2.6.4)

View solution in original post

ziegfried
Influencer

Here you go:

props.conf

[<your_sourcetype>]
REPORT-extract-crit=extract_crit

transforms.conf

[extract_crit]
REGEX = (?ms)\<Grade\>Critical\</Grade\>.+?\<Machines\>\s*(.+?)\s*\</Machines\>
FORMAT = critbox::$1

fields.conf

[critbox]
TOKENIZER=\<Machine.+?\>([^\<]+)

It would be possible without the TOKENIZER if the PCRE implementation would support variable-length look-behinds with an expression like this:

[test_crit]
REGEX = (?ms)(?<=\<Grade\>Critical\</Grade\>.+?\<Machines\>.+?(?\<Machine path="[^\>]*"\>[^\<]+\</Machine\>\s+)?)\<Machine path="[^\>]*"\>([^\<]+)\</Machine\>
FORMAT = critbox::$1
MV_ADD=true

I think support for this has been added to python 2.7 (Splunk's using 2.6.4)

ziegfried
Influencer

Freut mich, dass ich helfen konnte 😉 Sieht man sich im Jänner in San Francisco? Liebe Grüße aus Wien.

0 Karma

meno
Path Finder

Hi Siegfried, you are my RegEx Hero! It's working now...
Bounty is for you 😉

Ich habe mir daran echt die Zähne ausgebissen.
Die Lösung bleibt im Alpenraum 😉 Viele Grüsse aus der Schweiz...

0 Karma

meno
Path Finder

Opened a bounty as this question gets a showstopper. Anybody could help us finding the error or confirm that this is not possible with props/transforms? Thanks a lot...

0 Karma

meno
Path Finder

I added more data to show the idea behind it.

0 Karma

araitz
Splunk Employee
Splunk Employee

From your explanation, I can't see a reason that you must extract using that one regex. This works better:

[regex1]
REGEX = \<Grade\>([^\<]+)\</Grade\>
FORMAT= grade::$1

[regex2]
REGEX = <MachinesFound\>(\d+)\</MachinesFound\>
FORMAT= machine_count::$1

[regex3]
REGEX = \<Machine path="[^\>]*"\>([^\<]+)
FORMAT= machine_name::$1
MV_ADD= true

Then:

... | stats sum(machine_count) as total_count values(machine_name) as machine_names by grade

araitz
Splunk Employee
Splunk Employee

Ah, I see. I will add another extraction above.

0 Karma

meno
Path Finder

I do not need the path field. I want to create reports based on the amount of machines and their machinename, that are critical, error, warning,... machines. So critical, error, warning and so on must be connected with the concerning amount of machines.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...