Splunk Search

How to combine multiple rex expressions and rename the field for an eval expression?

Motivator

Hi, I wonder if someone could help me please.

I'm currently using the following to extract certain fields contained with the events raw data.

| rex "Address Line 1=(?<address1>[^,]*)"  | rex "Address Line 2=(?<address2>[^,]*)"  | rex "Address Line 3=(?<address3>[^,]*)"  | rex "Address Line 4=(?<address4>[^,]*)"  | rex "Postcode=(?<postcode>[^,]*)"  |

But to cut down on the number of searches, I'm trying to join the rex expressions together, so using the inbuilt field extractor I've come up with the following:

rex "^(?:[^=\n]*=){6}(?P<Address_Line_1>[^=]+)[^ \n]* (?P<Address_Line_2>[^=]+)=,\s+(?P<Address_Line_3>\w+\s+\w+\s+\d+)=,\s+(?P<Address_Line_4>[^=]+)=,\s+\w+\s+(?P<Postcode>[^=]+)" |

But I'm having a little difficulty in replicating this part of the original rex expressions

 (?<address1>

where I'm renaming the field with the aim of then using an eval expression to create a combined Address field.

Could someone perhaps have a look at this please and offer some guidance on how may go about achieving this.
Many thanks and kind regards

Chris

0 Karma
1 Solution

Esteemed Legend

Assuming that the OP has given sample data THAT DOES NOT MATCH his real data and that there is "something" before the text that HE SAID was sample data, this should work:

 ... | rex ".*?Address Line 1=(?<address1>[^\,]*)[^=]*=(?<address2>[^,]*)[^=]*=(?<address3>[^\,]*)[^=]*=(?<address4>[^\,]*)[^=]*=(?<Postcode>[\w]*)"

View solution in original post

0 Karma

Esteemed Legend

Assuming that the OP has given sample data THAT DOES NOT MATCH his real data and that there is "something" before the text that HE SAID was sample data, this should work:

 ... | rex ".*?Address Line 1=(?<address1>[^\,]*)[^=]*=(?<address2>[^,]*)[^=]*=(?<address3>[^\,]*)[^=]*=(?<address4>[^\,]*)[^=]*=(?<Postcode>[\w]*)"

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Motivator

I wear caps-lock earplugs.

Motivator

I know it is frustrating when the actual data does not match what we are given. Your regex almost matches my answer above, but it does not need ".*?" which adds a lot of steps (time for field extraction). regex101.com says that for the event:

Time=1 Jan 1970 01:00:00.000 GMT, Name=John Doe, Address Line 1=1The Street, Address Line 2=The Town, Address Line 3=, Address Line 4=The City, Postcode=AB12CD

".*?" adds 50 steps, increasing the matching from 39 steps without it to 89 steps with it.

Also, the author, Chris, mentioned that he wants the last field to be called "postalcode". That said, he rejected my answer and so I am waiting for him to give us the real data and results before proceeding any further.

0 Karma

Motivator

Hi, I have had a look at my raw data today and found that there was an error with the capitalisation of the 'Postcode' field, it should have been 'PostCode', so I've now been able to get the expression to work.

I just want to thank all that have contributed and despite their frusrations, their help.

Many thanks and kind regards

Chris

0 Karma

Motivator

Woodcock, if you look at the original regex (^(?:[^=\n]*=){6}) from the field extractor at the start of IRHM73's original post (which by the way seems to extract field names and not field values), it becomes obvious that there is data before "Address Line 1", contrary to the implications of his sample data. So the start of your regex will not capture correctly because it does not specify the correct starting point. The correct regex is:

(?i)Address Line 1=(?<address1>[^\,]*)[^=]*=(?<address2>[^\,]*)[^=]*=(?<address3>[^\,]*)[^=]*=(?<address4>[^\,]*)[^=]*=(?<postcode>[\w]*)

putting it in a rex at the search line and adding eval to combine them, we get:

| rex "(?i)Address Line 1=(?<address1>[^\,]*)[^=]*=(?<address2>[^,]*)[^=]*=(?<address3>[^\,]*)[^=]*=(?<address4>[^\,]*)[^=]*=(?<postcode>[\w]*)" | eval address=address1." ".address2." ".address3." ".address4." ".postcode

note: I highly recommend https://regex101.com/ for all things regex.

0 Karma

Motivator

My "error" may have been one of the commas in the brackets was not escaped. I think that regex101.com was fine with that but I guess Splunk was not. I "corrected" my answer.

0 Karma

Champion

Just to give you certainty on that, yes, splunk desires almost everything escaped (unlike regex101.com). Ran into problems because of that a few times, now I escape everything not in the alphabet or a number and had no problems since.

Splunk Employee
Splunk Employee

@jeffland I'm guessing you only see that inside a character class?

E.g.:

| windbag
| rex field=fancy_constant_field "(?<test>`~!@#\$%\^&\*\(\)-_=\+\{}\|;:<>,\.\/\? \[brackets])"

Note the only escaped fields are .^$*+?()[{\| - re: PCRE escaped characters.

Champion

Ok, you're right, when comparing to the overall number of special characters, only few of them have to be escaped. But regex101.com can still fool you because they accept stuff splunk (or any other "strict" PCRE implementation) doesn't, e.g. curly braces.

Motivator

Hi @landen99 thank you veyr much.

Kind Regards

Chris

0 Karma

Motivator

Hi @landen99, thank you for taking the time to reply to my post and come back to me with this.

Unfortuanetly, this doesn't extract the data and the 'address1' variable is blank.

Many thanks and kind regards

Chris

0 Karma

Motivator

okay, Chris, you really need to provide the raw logs because we are flying blind here.

index=blah your search | head 9 | table _raw

Then go and edit the data to obscure any sensitive parts.

Then provide us with the extract search query that you are using to determine if the extraction is successful.

If not successful extracting, please provide the exact results (obscuring data as needed):

index=blah your search | rex "Address Line 1\=(?<address1>[^\,]*)[^\=]*\=(?<address2>[^,]*)[^\=]*\=(?<address3>[^\,]*)[^\=]*\=(?<address4>[^\,]*)[^\=]*\=(?<postcode>[\w]*)" | eval address=address1." ".address2." ".address3." ".address4." ".postcode | head 9 | table address1 address2 address3 address4 postcode address
0 Karma

Esteemed Legend

Just swap out the names as you see fit like this:

Before:

[^\=]*\=(?<Address_Line_1>[^\,]*)[^\=]*\=(?<Address_Line_2>[^,]*)[^\=]*\=(?<Address_Line_3>[^\,]*)[^\=]*\=(?P<Address_Line_4>[^\,]*)[^\=]*\=(?<Postcode>[\w]*)

After:

[^\=]*\=(?<address1>[^\,]*)[^\=]*\=(?<address2>[^,]*)[^\=]*\=(?<address3>[^\,]*)[^\=]*\=(?<address4>[^\,]*)[^\=]*\=(?<Postcode>[\w]*)
0 Karma

Motivator

see my answer below. There must be data before "Address Line 1="

0 Karma

Motivator

Hi @woodcock, thnak you for this.

Yes I did try the expression, but as I say, isn't extracting the data.

Many thanks and kind regards

Chris

0 Karma

Esteemed Legend

How can that be? I copied the "Before" RegEx directly from the answer by @jeffland under which you said this:

The expression works, but it doesn't change for example 'Address Line 1' to 'address1' as per my original post.

So I modified the "working" solution to do that last part. You have got to get your stories straight or provide sample data or nobody is going to be able to help you.

0 Karma

Motivator

Hi @ woodcock, thank you for coming back to me with this.

Firstly my apologies if you feel my stories aren't straight, personally I don' t feel this the case, as this has changed since my initial post. I did also provide sample data to @jefferson who found it more than adequate, see below:

Address Line 1=1The Street, Address Line 2=The Town, Address Line 3=, Address Line 4=The City, Postcode=AB12CD

I am very new to Splunk so I appreciate that rex not be the correct command to work, but it seems to have worked fine so far.

I'll try to explain in simpler terms what I'm trying to acheive and hopefully this helps.

My original expression was | rex "Address Line 1=(?[^,]*)"........... The first part of the expression searches my 'Raw Data' for the field 'Address Line 1". It then assigns the variable 'address1' to this. This variable is used later on in my full search.

As mentioned in my original post, rather than having multiple searcheds i.e '| rex "Address....' I wanted to bring all the elements of the address into one rex expression.

Forgive me but from the testing I've done the expressions you kindly provided, certainly from my testing don't search for the 'Address Line 1' field to assign the variable 'address1' to it.

I hope this helps.

Many thanks and kind regards

Chris

0 Karma

Esteemed Legend

You may think you are being clear but I never seen a question with so much wasted effort to answer. I will prove to you that you are talking nonsense. I used YOUR DATA that you JUST POSTED above:

Address Line 1=1The Street, Address Line 2=The Town, Address Line 3=, Address Line 4=The City, Postcode=AB12CD

Then I used MY SOLUTION that I say works:

[^\=]*\=(?<address1>[^\,]*)[^\=]*\=(?<address2>[^,]*)[^\=]*\=(?<address3>[^\,]*)[^\=]*\=(?<address4>[^\,]*)[^\=]*\=(?<Postcode>[\w]*)

I used the tool Expresso to double-check and it DOES WORK exactly as I said. I will not bother to post a screenshot.

0 Karma

Motivator

Your rex will not work if his data is really:

Time=1 Jan 1970 01:00:00.000 GMT, Name=John Doe, Address Line 1=1The Street, Address Line 2=The Town, Address Line 3=, Address Line 4=The City, Postcode=AB12CD

What we really need is for him to post the raw data and the search results with obfuscation as necessary for privacy. See my answer below.

0 Karma