Deployment Architecture

Help creating log field extraction regex

ptur
Path Finder

Hello,
We have several comma delimited logs with static set of fields. I want to extract the fields with use of regex.

Here's my current regex, where the result is the content between 11th and 12th comma (in a log with 40 fields) . This however does not account for commas within the fields (LDAP results for example) - these are isolated with quotation marks.

| rex field=_raw "(?:[^\,]*\,){11}(?<Result>.*)(?:[^\,]*\,){28}"|

I need help with modifying my regex to ignore contents between quotation marks - i'd appreciate if someone who maybe had the same issue, could help.

Thanks!

Tags (1)
0 Karma

DalJeanis
Legend

Depending on where you are trying to extract it...

| rex   "^(?:(\"[^\"]*\"|[^,]*),){10}(?<myelevens>\"[^\"]\"|[^,]*),"

For an index-time extraction, I'd have to verify the escaping and capture-marking. I believe that, without the outside quotes and the capture-name, that regex will work and the capturing group you want will be \2, but I'd have to test it to be sure.


EXPLANATION IN ENGLISH OF THE REGEX

Here's the working parts of that regex:

This part, if it encounters a quote, will take everything to the next quote. Since we are throwing the match away, I didn't worry about whether or not we kept the quotes themselves... we kept them. We will want to make sure that that part goes first anyplace there could be a quoted string, both at the beginning and after every comma.

 \"[^\"]*\"

This alternate part grabs anything there that isn't a comma, we will use it wherever the first part fails.

[^,]*

Now we group those as alternates (___|___) , and we'll use them between the commas.

 (\"[^\"]*\"|[^,]*)

Anchor the start of the string ^, grab the comma afterward ,, group it in a noncapturing group (:?) and repeat the whole group 10 times {10} --- ^(?:___,){10}

 ^((\"[^\"]*\"|[^,]*),)

That whole mess gets rid of everything through the first ten commas. Now we'll just repeat the same unit, but get rid of that colon to make it capturing and add the capture name <myelevens> ---

(?<myelevens>\"[^\"]\"|[^,]*),
0 Karma

skoelpin
SplunkTrust
SplunkTrust

A better approach would be to use KV_MODE

KV_MODE = [none|auto|auto_escaped|multi|json|xml]

https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Propsconf

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi ptur,
to insert in the text editor use the Code Sample option (the button with 101010).
could you share an example of your logs?
Did you already tried to use the web interface field extractor? it's very useful to extract in situations like yours.

Bye.
Giuseppe

0 Karma

knielsen
Contributor

Hi,

Not sure if I got all your cases, can you try this?

rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

I tried it like this to test it out:

| makeresults | eval line="1,2,3?,? ,4,5,6,7,8,9,10,11,this is ?,interesting,x,y,z" | rex field=line "([^,\?]*(?:\?.[^,\?]*)*,){11}(?<field1>[^,\?]*(?:\?.[^,\?]*)*),"

Don't ask me how I got there. I once had a similar problem with escaped \" in logs that had fields delimited by actual ".

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...