Splunk Search

Rex question to extract a field

lehrfeld
Path Finder

I have been playing with the rex command for awhile now and I am stuck. I have a csv source that I need to extract a 1-3 digit number from. The trouble is that depending upon the line in the csv, it is either in the 8th or 9th column in the csv. Sample csv line:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,Windows,more,text

OR

8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

I would like to extract the 204 from the first line and 23 from the second. The trouble is the extra comma in the location in some of the lines (,"CENTRAL, MAINE",).

My base rex is | rex field=_raw ",{8,9}.{1,3} (?<AAA>.*)"

What I am thinking I am doing is searching for 8 or 9 commas ,{8,9} then when it finds it, select the next 1 to 3 characters .{1,3} then put all that in an extraction called AAA (?<AAA>.*)

I think I am just off, but any tips would be great!

Thanks, Mike

Tags (2)
0 Karma
1 Solution

curtisb1024
Path Finder

To make this work correctly in all situations, you need to have the regex handle when CSV fields are enclosed in quotes:

| rex "^(?:(?:\"(?:\"\"|.)*?\",)|(?:[^,]*),){7}(?P<AAA>[^,]+)"

If a field starts with a double quote, everything up to the next ", will be captured as part of that field.

This regex works for all the following examples:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, M"AINE" ",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,"a"",",asdf,x123456,204,PC,"Windows",more,text
8/28/2013 15:04,100.200.250.27,,username@email.com,"",DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

Edit: Updated to handle double quotes inside quotes followed by a comma.

View solution in original post

curtisb1024
Path Finder

To make this work correctly in all situations, you need to have the regex handle when CSV fields are enclosed in quotes:

| rex "^(?:(?:\"(?:\"\"|.)*?\",)|(?:[^,]*),){7}(?P<AAA>[^,]+)"

If a field starts with a double quote, everything up to the next ", will be captured as part of that field.

This regex works for all the following examples:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, M"AINE" ",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,"a"",",asdf,x123456,204,PC,"Windows",more,text
8/28/2013 15:04,100.200.250.27,,username@email.com,"",DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

Edit: Updated to handle double quotes inside quotes followed by a comma.

lehrfeld
Path Finder

Great call with the double quote - thanks!

0 Karma

somesoni2
Revered Legend

This works fine with your example logs.

... | rex "(?i)^(?:[^,]*,){7,8}(?P<AAA>[^,]+)"

curtisb1024
Path Finder

This won't work for OPs second example if the field after the 23 contains a value.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...