Deployment Architecture

Help creating log field extraction regex

ptur
Path Finder

Hello,
We have several comma delimited logs with static set of fields. I want to extract the fields with use of regex.

Here's my current regex, where the result is the content between 11th and 12th comma (in a log with 40 fields) . This however does not account for commas within the fields (LDAP results for example) - these are isolated with quotation marks.

| rex field=_raw "(?:[^\,]*\,){11}(?<Result>.*)(?:[^\,]*\,){28}"|

I need help with modifying my regex to ignore contents between quotation marks - i'd appreciate if someone who maybe had the same issue, could help.

Thanks!

Tags (1)
0 Karma

DalJeanis
Legend

Depending on where you are trying to extract it...

| rex   "^(?:(\"[^\"]*\"|[^,]*),){10}(?<myelevens>\"[^\"]\"|[^,]*),"

For an index-time extraction, I'd have to verify the escaping and capture-marking. I believe that, without the outside quotes and the capture-name, that regex will work and the capturing group you want will be \2, but I'd have to test it to be sure.


EXPLANATION IN ENGLISH OF THE REGEX

Here's the working parts of that regex:

This part, if it encounters a quote, will take everything to the next quote. Since we are throwing the match away, I didn't worry about whether or not we kept the quotes themselves... we kept them. We will want to make sure that that part goes first anyplace there could be a quoted string, both at the beginning and after every comma.

 \"[^\"]*\"

This alternate part grabs anything there that isn't a comma, we will use it wherever the first part fails.

[^,]*

Now we group those as alternates (___|___) , and we'll use them between the commas.

 (\"[^\"]*\"|[^,]*)

Anchor the start of the string ^, grab the comma afterward ,, group it in a noncapturing group (:?) and repeat the whole group 10 times {10} --- ^(?:___,){10}

 ^((\"[^\"]*\"|[^,]*),)

That whole mess gets rid of everything through the first ten commas. Now we'll just repeat the same unit, but get rid of that colon to make it capturing and add the capture name <myelevens> ---

(?<myelevens>\"[^\"]\"|[^,]*),
0 Karma

skoelpin
SplunkTrust
SplunkTrust

A better approach would be to use KV_MODE

KV_MODE = [none|auto|auto_escaped|multi|json|xml]

https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Propsconf

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi ptur,
to insert in the text editor use the Code Sample option (the button with 101010).
could you share an example of your logs?
Did you already tried to use the web interface field extractor? it's very useful to extract in situations like yours.

Bye.
Giuseppe

0 Karma

knielsen
Contributor

Hi,

Not sure if I got all your cases, can you try this?

rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

I tried it like this to test it out:

| makeresults | eval line="1,2,3?,? ,4,5,6,7,8,9,10,11,this is ?,interesting,x,y,z" | rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

Don't ask me how I got there. I once had a similar problem with escaped \" in logs that had fields delimited by actual ".

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...