Splunk Search
Highlighted

How do I edit my regex to parse fields correctly if a field delimiter appears within a field?

Path Finder

Hi,

Another regex problem I'm afraid.....

I've got a very long event with 37 fields where all the fields are quoted and separated by comma. Also there are no key=value pairs.
For the most part my regex works nicely with the event data, but there are occasions where a quote also appears in the actual field data thereby breaking my regex separator character.

Working example (extremely simplified regex and event):

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"]+)","(?P<response>[^\n]+)"$

Data:

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/lpbm/select?fl=city","Logging rate limit reached"

No problem with this, all the fields parse out OK. However, this next event fails - note the additional " in fourth field:-

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/"lpbm"/select?fl=city","Logging rate limit reached"

This now breaks the [^"]+)"," part of my regex and distorts the field extractions.

Is there a way to do the equivalent of:-

......","(?P<request>[^","]+)",".......

I know that this is invalid, but I don't know what the alternative looks like 😞 !!

Thanks for any help,
Mark.

0 Karma
Highlighted

Re: How do I edit my regex to parse fields correctly if a field delimiter appears within a field?

Path Finder

Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"] syntax. The advantage is, that you can use the whole pattern "," as seperator instead of just [^"]. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:

^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$

What's the difference:

  • I'd say the [^"] syntax is "old school". The parser is consuming just everything until an " is found.
  • Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single " but no "," as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)

/edit & just as info: a ? makes an quantifier lazy (here: .+?: "Consume lazy at least one character").

View solution in original post

Highlighted

Re: How do I edit my regex to parse fields correctly if a field delimiter appears within a field?

SplunkTrust
SplunkTrust

Try the following:

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"][^,]+)","(?P<response>[^\n]+)"$

You can test it here: https://regex101.com/r/nD3sL1/2