Splunk Search

Rex to extract text that ends with either one of multiple words

Cbr1sg
Path Finder

Hello all,
I have data like this

reason="abc";appName=....
reason="xyz";ERServer=...
reason="dfg",ClientBob=...

How to extract only abc,xyz and dfg

note that abc, xyz and dfg might contain character ";" or "," or "=" or doublequotes or single quote.
Basically it is really dynamic and can contain any kind of character. The only consistent pattern are the ending words as mentioned above

Thanks

Tags (2)
0 Karma
1 Solution

DMohn
Motivator

Okay, given the examples you provided for @cpetterborg above, and your statement that only the 3 mentioned keywords above could mark the end fo your event, a RegEx that would match looks like this:

 rex field=data "reason=\"(?<reason>.*)\".(?:appName|ERServer|ClientBob)"

You can add more delimiting keywords in the second (non-capturing) group, seperated by pipes. Keep in mind however that this is quite an "expensive" regex, which could signifcantly impact your search performance. Bot in your case it might be the only way to achive what you need!

View solution in original post

0 Karma

DMohn
Motivator

Okay, given the examples you provided for @cpetterborg above, and your statement that only the 3 mentioned keywords above could mark the end fo your event, a RegEx that would match looks like this:

 rex field=data "reason=\"(?<reason>.*)\".(?:appName|ERServer|ClientBob)"

You can add more delimiting keywords in the second (non-capturing) group, seperated by pipes. Keep in mind however that this is quite an "expensive" regex, which could signifcantly impact your search performance. Bot in your case it might be the only way to achive what you need!

0 Karma

Cbr1sg
Path Finder

This is exactly something I want, at least logically. However it doesn't work. Splunk only extract all the text between "reason=" and "appName"
But it ignores ERServer and ClientBob. Seems the "OR" statement is not recognized properly, is this a bug?

0 Karma

Cbr1sg
Path Finder

It works perfectly now after changing to this rex below, thanks a lot mate!

| rex field=_raw "reason=\"(?.*)(appName|ERServer|ClientBob)"

vinod94
Contributor

You can try this,

your index | rex field=_raw "reason\=.(?P<field_name>[^\.]\w+)"
0 Karma

Cbr1sg
Path Finder

Thanks for the help mate, but it doesn't work for me. It only extracts single word after "reason="

0 Karma

DMohn
Motivator

If the first and last charakter for the reason field always will be a double quote and contains no equal-sign, you could try to use a greedy match like this:

rex field=data "reason=\"(?<reason>.*)\"\S+="

This reges will try to match as many charakters as possible until the last double-quote which then is followed by non-whitespace-chars and a equal-sign.

For my set of test data this worked perfectly, even if the reason contained one or more commas, semicolons or double quotes.

0 Karma

Cbr1sg
Path Finder

I' m really really sorry, the text i want to extract also contains equal sign so this won't work as well. My bad, my description about the issue was not clear enough. I updated original post with full range of character that might be included in the text i want to extract

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Here is a run anywhere search that shows the rex command that will pick out the field as you have provided the data for in your question:

| makeresults | 
eval data="reason=\"abc\";appName=....
 reason=\"xyz\";ERServer=...
 reason=\"dfg\",ClientBob=..." | 
 makemv delim="
 " data | 
 rex field=data "reason=\"(?<reason>[^\"]*)"

I'm not sure if this meets your requirements, but it can be run in any Splunk search bar and produce the results you have requested. The last line is the only one that is really doing any of the work for that purpose. The other lines are only setting up the data that simulates the events as portrayed above.

0 Karma

Cbr1sg
Path Finder

Sorry this will not work, as the contents I want to extract will also contain "double quote". Sorry I forgot to mentioned in original post (I have updated it now)

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

As long as there is something like ;wordchars= after the quoted data you want to extract, the following will probably work for you:

| makeresults | 
 eval data="reason=\"a\\\"b\"c\";appName=....
  reason=\"x;yz\";ERServer=...
  reason=\"df,g\",ClientBob=..." | 
  makemv delim="
  " data | 
  rex field=data "reason=\"(?<reason>.*?)\"[;,]\w+="
0 Karma

Cbr1sg
Path Finder

I' m really really sorry, the text i want to extract also contains equal sign so this won't work as well. My bad, my description about the issue was not clear enough. I updated original post with full range of character that might be included in the text i want to extract

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Unless the double quotes inside the field are escaped (for example with a backslash) you are pretty much screwed because there doesn't seem to be enough regularity to the string to make extracting it properly an option. If you give an exact example (doesn't have to contain real data, just valid data with all the possibilities, so clean it up for public consumption), it might be possible to help you on this.

0 Karma

Cbr1sg
Path Finder

Yes you're right that makes sense. 3 examples of the data are as below:

reason="AAABBB";Client="112233",source="aassdd";server="IIHHSS";appName="ooiiuu"
reason="NNCCSA";Network="asdasasd";NextHop="asda",data="asdasasd";Subnet="10.12.12.12,24";RemoteIP="12.12.12.12,mask=255.255.255.0";ClientBob="aabbcc"
reason="dgfsdd";External="asdasas";Policy="asdasasda";Domain="asdasdas";ClientVersion=12312312321";Path="hop1=1213,hop2=23432,hop3=23432,hop4=2343";ERServer=asdadasda"

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

You still need the examples to have the additional characters, the way they would appear IRL, or at least the = and " characters, because the rex I presented above works on this data.

0 Karma

Cbr1sg
Path Finder

No it doesn't. From the example above, let's take this
reason="AAABBB";Client="112233",source="aassdd";server="IIHHSS";appName="ooiiuu"

The text I want to extract is everything between reason= and appName=, which is
AAABBB";Client="112233",source="aassdd";server="IIHHSS

The reason I want all of this together is because
1. There are duplicate fields. For example Splunk already has its own field "source" and I don't want to create another
2. Yes It's possible to separate everything into different columns so we will have multiples fields like reason, Client, source, server. But as you can see the texts are really dynamic, the column are not always the same. The above examples are only 3 among many other scenarios. I would need thousands of eval statement to join the fields together which take too much effort. Those texts are error messages and they are only meaningful when joined together.

0 Karma

adonio
Ultra Champion

do they always end in a double quotes?
also, looks like you have key value pairs using "=" and separated with ";" OR ","
take a look at the extract command

0 Karma

Cbr1sg
Path Finder

This will not work, as the contents I want to extract will also contain ";" OR "," (as I mentioned in original post)

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...