Splunk Search

Rex Not Extracting All Data

IRHM73
Motivator

HI,

I wonder whether someone could help me please.

I'm trying to extract the first name from the data as shown below:

 [{"name":{"current":{"firstName":"M","lastName":"SMITH"}},"ids":{"nino":"AA111111A"},"dateOfBirth":"26121973"}] 

So I've put together the following rex:

rex field="detail.output-cid-response" "\"firstName\":\"(?<cidFName>[^\"]+)" 

The problem I have is that although there is data there, it is not extracting the "cidFName" for all the records and to be honest I'm at a loss why.

Could someone perhaps shed some light on where I'm going wrong please.

Many thanks and kind regards

Chris

Tags (2)
0 Karma
1 Solution

wpreston
Motivator

Can you try this one?

rex field="detail.output-cid-response" "firstName\\\":\\\"(?<cidFName>[^\\]+)\\"

View solution in original post

0 Karma

wpreston
Motivator

Can you try this one?

rex field="detail.output-cid-response" "firstName\\\":\\\"(?<cidFName>[^\\]+)\\"
0 Karma

wpreston
Motivator

I indexed your sample data and was able to use the following regex to extract "JOHN" as the "firstName" field. One rex extracted from the _raw field as a source, and the other extracted from the detail.output-cid-response field as a source. Please see if either fits your needs:

| rex field=_raw "firstName[\\\]\":[\\\]\"(?<firstNameRaw>[^\\\]+)[\\\]"

| rex field="detail.output-cid-response" "firstName\":\"(?<firstName>[^\"]+)\""
0 Karma

IRHM73
Motivator

Eureka!!!!

@Wpreston, thank you for coming back to me with this it is greatly appreciated. I've tried the queries you kindly provided and this one worked: | rex field=_raw "firstName[\\\]\":[\\\]\"(?[^\\\]+)[\\\]"

So that I can learn from this, could I ask please what the '[' and ']' do?

Many thanks and kind regards

Chris

wpreston
Motivator

The "[" and "]" characters are used to make a regular expression character class. They are typically used when you want to match one of several characters. Consider a commonly misspelled word like "separate". If you were looking for all instances of this word, you might want to make allowances for people who spelled it "seperate" as well. Using a character class, your regex would be sep[ae]rate.

Regular-expressions.info has a good write up on character classes and could explain them much better than I can. Glad that this worked for you!

0 Karma

IRHM73
Motivator

Hi @wpreston, thank you very much for the explanation and for the link, which is a really great article.

Kind Regards

Chris

0 Karma

IRHM73
Motivator

Hi @wpreston, thank you for this.

Unfortunately when I run this I recieve this error:

Error in 'rex' command: Encountered
the following error while compiling
the regex
'firstName\":\"(?[^]+)\':
Regex: \ at end of pattern

Many thanks and kind regards

Chris

0 Karma

wpreston
Motivator

Hmm, ok how about this one?

rex field=detail.output-cid-response "firstName.\":.\"(?<NewField>.+)[\\\]\","
0 Karma

muebel
SplunkTrust
SplunkTrust

Hi IRHM73, That either means that the regex isn't valid for all values of the "detail.output-cid-response" field, or that the "detail.output-cid-response" field doesn't exist for all events.

I would run the regex over _raw, which is the default value for the rex command.

So, in that way, try running

rex "\"firstName\":\"(?<cidFName>[^\"]+)" 

If that doesn't pull all the cidFName fields as you would expect, post the _raw for the events where the field isn't extracting properly.

Please let me know how this works! 😄

0 Karma

IRHM73
Motivator

Hi @muebel, thank you for taking the time to reply to my post.

I tried the query you kindly sent but found I had to put 'rex field....' in front.

But unfortunately the details on some of the records are missing inc the one shown as the raw data log below:

{"auditSource":"matching","auditType":"TxSucceeded","eventId":"cc642788","tags":{"X-Request-ID":"uke83d","transactionName":"Search"},"detail":{"output-cid-response":"[{\"name\":{\"current\":{\"firstName\":\"JOHN\",\"lastName\":\"SMITH\"}},\"ids\":{\"nino\":\"AA111111A\"},\"dateOfBirth\":\"26121973\"}]","output-cycle":"CYCLE3","output-matching-time-in-millis":"120","input-searchRequest":"IncomingSearchRequest(Some(AA111111A),Some(John),Some(Smith),Some(1973-12-26))","output-errors":"[]","output-result":"match found","input-nino":"AA111111A"},"generatedAt":"2015-10-20T20:04:14.728Z"}

I can confirm that the detail.output-cid-response is present in all records and as far as I can see they are exactly the same with differeing usernames, nino's etc.

Many thanks and kind regards

Chris

0 Karma

muebel
SplunkTrust
SplunkTrust

the double quotes are escaped within the _raw of all the events? In that case try escaping the slashes as well:

 rex field=_raw "\\\"firstName\\\":\\\"(?<cidFName>[^\"]+)" 
0 Karma

IRHM73
Motivator

Hi I really appreciate you coming back to me with this.

In answer to your question, all the raw events the double quotes are escaped.

I tried the query you provided, but unfortunately I receive the following error:

Error in 'SearchParser': Missing a
search command before '^'. Error at
position '470' of search query 'search
index=main auditSource="matching"
auditType...{snipped} {errorcontext =
Name":"(?[^"]+)" | e}'.

Many thanks and kind regards

Chris

0 Karma

renatobamorim
Explorer

If any event has two names on this field, better you use this:

firstName\":\"(?P<cidFname>.*?)\"

or

firstName\":\"(?P<cidFname>[\w\s]+)\"
0 Karma

IRHM73
Motivator

Hi @renatobamorim, thank you for taking the time to come back to me with this.

I must admit I wasn't quite sure what to do with the query you kindly sent but using the snippet as the following:

rex field="detail.output-cid-response" ""firstName":"(?.*?)"" 

I receive

a Error in 'rex' command: Encountered
the following error while compiling
the regex 'firstName:(?.*?)': Regex:
unrecognized character after (? or (?-

I've had to include the double " otherwise I receive an unbalanced quotes error message.

Many thanks and kind regards

Chris

0 Karma

renatobamorim
Explorer

You'll need to escape the double quote, like this:

rex field="detail.output-cid-response" "\"firstName\":\"(?P<field_name>.*?)\"" 

or

rex field="detail.output-cid-response" "\"firstName\":\"(?P<field_name>[^\"]+)"
0 Karma

IRHM73
Motivator

Hi thank you for clarifying on how to use the querys.

Unfortunately there was no change using firstName\":\"(?P.*?)\"and firstName\":\"(?P[\w\s]+)\"" didn't extract any information.

I then tried:

firstName\":\"(?P<cidFname>.*?)\"
firstName\":\"(?P<cidFname>[\w\s]+)\"
\"firstName\":\"(?<cidFName>[^\"]+)" 

All with the rex=field raw, and unfortunately these did not extract any of the information.

Many thanks and kind regards

Chris

0 Karma

krish3
Contributor

try this....

\"firstName\":\"(?<cidFName>[\w]+)"
0 Karma

IRHM73
Motivator

Hi @krish3, thank you for taking the time to reply to my post,

I've tried the query you kindly provided, but unfortunately this hasn't made any difference.

Many thanks and kind regards

Chris

0 Karma

krish3
Contributor

Can you please share what is the value of field detail.output-cid-response

0 Karma

IRHM73
Motivator

Hi @krish3, my apologies for not making this clear but detail,.output-cid-response is the raw data shown in my initial post i.e. [{"name":{"current":{"firstName":"CHRIS","lastName":"SMITH"}},"ids":{"nino":"AA111111A"},"dateOfBirth":"26121973"}]

Many thanks and kind regards

Chris

0 Karma

krish3
Contributor

Can you post few more lines of your logs I do not see any issues with the regex pattern....

Check your regex here

0 Karma
Get Updates on the Splunk Community!

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...

Reminder! Splunk Love Promo: $25 Visa Gift Card for Your Honest SOAR Review With ...

We recently launched our first Splunk Love Special, and it's gone phenomenally well, so we're doing it again, ...