Splunk Search

Regex to extract numbers from non unique string

ahogbin
Communicator

Hello,

I am hoping that someone with far more knowledge than myself can help with a bit of a puzzling problem I have with trying to extract some numbers from a non unique string

The numbers I am trying to extract are in the <_0:EndOfTermAmount>999.30</_0:EndOfTermAmount>
but this issue I have is that this is no a unique identifier (ie the same string appears in several other places within the XML.

The only unique thing is that the numbers I am after are always contained within

<_0:AALNet>
            <_0:AnnualAmount>0.00</_0:AnnualAmount>
            <_0:MonthlyAmount>0.00</_0:MonthlyAmount>
            <_0:FortnightlyAmount>0.00</_0:FortnightlyAmount>
            <_0:EndOfTermAmount>999.30</_0:EndOfTermAmount>
            <_0:ComparisonAmount>0.00</_0:ComparisonAmount>
            <_0:InstallmentGapAmount>0.00</_0:InstallmentGapAmount>
          </_0:AALNet>

Is there a way of using the _0:AALNET as a key to then extract the required number (in this case 999.30) using regex (or some other means) ? - the length of the numbers to be extracted can vary.

I am totally stumped on this one so any help or pointers will be greatly appreciated.

Cheers.

Alastair

Tags (1)
0 Karma
1 Solution

acharlieh
Influencer

If you're doing this in a search, could you use either the xpath command or the spath command to help?

Based on your fragment, and not an entire sample, maybe something like this:

... | xpath outfield=EndOfTermAmount "//*[local-name()='AALNet']/*[local-name()='EndOfTermAmount']"

(Depending on the overall structure of the XML, there could be nuances to xpath and spath, but hopefully this points you in a good direction... stupid xpath and namespace issues... you may be interested in this stack overflow answer also this other stackoverflow answer)

EDIT TO ADD: Out of frustration with the xpath command and it not supporting namespaces unless done in a really ugly manner using namespace-uri()= at each level, I logged P4 case 389100 asking to have namespace binding added to the xpath command.

View solution in original post

acharlieh
Influencer

If you're doing this in a search, could you use either the xpath command or the spath command to help?

Based on your fragment, and not an entire sample, maybe something like this:

... | xpath outfield=EndOfTermAmount "//*[local-name()='AALNet']/*[local-name()='EndOfTermAmount']"

(Depending on the overall structure of the XML, there could be nuances to xpath and spath, but hopefully this points you in a good direction... stupid xpath and namespace issues... you may be interested in this stack overflow answer also this other stackoverflow answer)

EDIT TO ADD: Out of frustration with the xpath command and it not supporting namespaces unless done in a really ugly manner using namespace-uri()= at each level, I logged P4 case 389100 asking to have namespace binding added to the xpath command.

ahogbin
Communicator

Perfect... both options worked it is just the xpath command allows me to extract 'all' variances (I had not realised what I thought was unique was repeated).

0 Karma

ahogbin
Communicator

One further question... I end up with 3 values but I only need one. The XML component for the value I am after also contains <_0:AnnualAmount> could I somehow use this in the xpath query to limit the results returned ?

Cheers

0 Karma

acharlieh
Influencer

xpath itself is a rather powerful language with lots of functions available and all kinds of ways to add additional qualifiers. (I'm not quite sure what version Splunk uses under the hood, so couldn't tell you what would work and what wouldn't without trying some things)...

But taking a whack at your request, select an EndOfTermAmount whose parent AALNet element also has a child AnnualAmount element would be something like:

... | xpath outfield=test "//*[local-name()='AALNet' and *[local-name()='AnnualAmount']]/*[local-name()='EndOfTermAmount']"

I figured that out from another stackoverflow post

0 Karma

ahogbin
Communicator

Works a treat.. thank you so much.

Alastair

0 Karma

acharlieh
Influencer

One other note, I see you're using the "award points" link. Instead of doing that, for content (questions, answers, comments) that you believe is particularly good (helpful, useful, amusing, etc.), you can instead use the vote up link (^). Not only does this not cost you any karma points, but it marks that the content was particularly worthwhile to someone (and still awards karma points to the creator of said worthwhile content).

As a moderator, I gave you the 101 points that you gave away in this manner so far. While you can still use award points for something that is mind blowing beyond belief, that requires both a vote up and then extra points, save your karma for those special occasions, and vote first 🙂

ahogbin
Communicator

Thank you so much 🙂

0 Karma

svenwendler
Path Finder

Try this:

(?s)<_0:AALNet>.*<_0:EndOfTermAmount>(?P<mynumber>[^<]+)

(?s) modifies to include newline characters - i.e. make it a single line. And "mynumber" will be 999.30

0 Karma
Get Updates on the Splunk Community!

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...

What's New in Splunk Cloud Platform 9.0.2208?!

Howdy!  We are happy to share the newest updates in Splunk Cloud Platform 9.0.2208! Analysts can benefit ...

Admin Console: A Single, Unified Interface for All Your Cloud Admin Needs

WATCH NOWJoin us to learn how the admin console can save you time and give you more control over the Splunk® ...