Splunk Search

Rex question to extract a field

Path Finder

I have been playing with the rex command for awhile now and I am stuck. I have a csv source that I need to extract a 1-3 digit number from. The trouble is that depending upon the line in the csv, it is either in the 8th or 9th column in the csv. Sample csv line:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,Windows,more,text

OR

8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

I would like to extract the 204 from the first line and 23 from the second. The trouble is the extra comma in the location in some of the lines (,"CENTRAL, MAINE",).

My base rex is | rex field=_raw ",{8,9}.{1,3} (?<AAA>.*)"

What I am thinking I am doing is searching for 8 or 9 commas ,{8,9} then when it finds it, select the next 1 to 3 characters .{1,3} then put all that in an extraction called AAA (?<AAA>.*)

I think I am just off, but any tips would be great!

Thanks, Mike

Tags (2)
0 Karma
1 Solution

Path Finder

To make this work correctly in all situations, you need to have the regex handle when CSV fields are enclosed in quotes:

| rex "^(?:(?:\"(?:\"\"|.)*?\",)|(?:[^,]*),){7}(?P<AAA>[^,]+)"

If a field starts with a double quote, everything up to the next ", will be captured as part of that field.

This regex works for all the following examples:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, M"AINE" ",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,"a"",",asdf,x123456,204,PC,"Windows",more,text
8/28/2013 15:04,100.200.250.27,,username@email.com,"",DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

Edit: Updated to handle double quotes inside quotes followed by a comma.

View solution in original post

Path Finder

To make this work correctly in all situations, you need to have the regex handle when CSV fields are enclosed in quotes:

| rex "^(?:(?:\"(?:\"\"|.)*?\",)|(?:[^,]*),){7}(?P<AAA>[^,]+)"

If a field starts with a double quote, everything up to the next ", will be captured as part of that field.

This regex works for all the following examples:

2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, M"AINE" ",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,ABC,"CENTRAL, MAINE",x123456,204,PC,"Windows",more,text
2014/04/28 07:23:41 AM EDT,100.200.250.27,email link click,username@email.com,"a"",",asdf,x123456,204,PC,"Windows",more,text
8/28/2013 15:04,100.200.250.27,,username@email.com,"",DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,PC,,IE 8,Mozilla,more
8/28/2013 15:04,100.200.250.27,,username@email.com,ABC,DEF,x987654,23,,,IE 8,Mozilla,more

Edit: Updated to handle double quotes inside quotes followed by a comma.

View solution in original post

Path Finder

Great call with the double quote - thanks!

0 Karma

Revered Legend

This works fine with your example logs.

... | rex "(?i)^(?:[^,]*,){7,8}(?P<AAA>[^,]+)"

Path Finder

This won't work for OPs second example if the field after the 23 contains a value.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!