Help creating log field extraction regex

ptur · ‎07-31-2017

Hello,
We have several comma delimited logs with static set of fields. I want to extract the fields with use of regex.

Here's my current regex, where the result is the content between 11th and 12th comma (in a log with 40 fields) . This however does not account for commas within the fields (LDAP results for example) - these are isolated with quotation marks.

| rex field=_raw "(?:[^\,]*\,){11}(?<Result>.*)(?:[^\,]*\,){28}"|

I need help with modifying my regex to ignore contents between quotation marks - i'd appreciate if someone who maybe had the same issue, could help.

Thanks!

DalJeanis · ‎07-31-2017

Depending on where you are trying to extract it...

| rex   "^(?:(\"[^\"]*\"|[^,]*),){10}(?<myelevens>\"[^\"]\"|[^,]*),"

For an index-time extraction, I'd have to verify the escaping and capture-marking. I believe that, without the outside quotes and the capture-name, that regex will work and the capturing group you want will be \2, but I'd have to test it to be sure.

EXPLANATION IN ENGLISH OF THE REGEX

Here's the working parts of that regex:

This part, if it encounters a quote, will take everything to the next quote. Since we are throwing the match away, I didn't worry about whether or not we kept the quotes themselves... we kept them. We will want to make sure that that part goes first anyplace there could be a quoted string, both at the beginning and after every comma.

 \"[^\"]*\"

This alternate part grabs anything there that isn't a comma, we will use it wherever the first part fails.

[^,]*

Now we group those as alternates (___|___) , and we'll use them between the commas.

 (\"[^\"]*\"|[^,]*)

Anchor the start of the string ^, grab the comma afterward ,, group it in a noncapturing group (:?) and repeat the whole group 10 times {10} --- ^(?:___,){10}

 ^((\"[^\"]*\"|[^,]*),)

That whole mess gets rid of everything through the first ten commas. Now we'll just repeat the same unit, but get rid of that colon to make it capturing and add the capture name <myelevens> ---

(?<myelevens>\"[^\"]\"|[^,]*),

skoelpin · ‎07-31-2017

A better approach would be to use KV_MODE

KV_MODE = [none|auto|auto_escaped|multi|json|xml]

https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Propsconf

gcusello · ‎07-31-2017

Hi ptur,
to insert in the text editor use the Code Sample option (the button with 101010).
could you share an example of your logs?
Did you already tried to use the web interface field extractor? it's very useful to extract in situations like yours.

Bye.
Giuseppe

knielsen · ‎07-31-2017

Hi,

Not sure if I got all your cases, can you try this?

rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

I tried it like this to test it out:

| makeresults | eval line="1,2,3?,? ,4,5,6,7,8,9,10,11,this is ?,interesting,x,y,z" | rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

Don't ask me how I got there. I once had a similar problem with escaped \" in logs that had fields delimited by actual ".

Help creating log field extraction regex

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation

Help creating log field extraction regex

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...