Splunk Search

How do I edit my regex to parse fields correctly if a field delimiter appears within a field?

Path Finder

Hi,

Another regex problem I'm afraid.....

I've got a very long event with 37 fields where all the fields are quoted and separated by comma. Also there are no key=value pairs.
For the most part my regex works nicely with the event data, but there are occasions where a quote also appears in the actual field data thereby breaking my regex separator character.

Working example (extremely simplified regex and event):

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"]+)","(?P<response>[^\n]+)"$

Data:

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/lpbm/select?fl=city","Logging rate limit reached"

No problem with this, all the fields parse out OK. However, this next event fails - note the additional " in fourth field:-

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/"lpbm"/select?fl=city","Logging rate limit reached"

This now breaks the [^"]+)"," part of my regex and distorts the field extractions.

Is there a way to do the equivalent of:-

......","(?P<request>[^","]+)",".......

I know that this is invalid, but I don't know what the alternative looks like 😞 !!

Thanks for any help,
Mark.

0 Karma
1 Solution

Path Finder

Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"] syntax. The advantage is, that you can use the whole pattern "," as seperator instead of just [^"]. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:

^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$

What's the difference:

  • I'd say the [^"] syntax is "old school". The parser is consuming just everything until an " is found.
  • Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single " but no "," as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)

/edit & just as info: a ? makes an quantifier lazy (here: .+?: "Consume lazy at least one character").

View solution in original post

SplunkTrust
SplunkTrust

Try the following:

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"][^,]+)","(?P<response>[^\n]+)"$

You can test it here: https://regex101.com/r/nD3sL1/2

Path Finder

Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"] syntax. The advantage is, that you can use the whole pattern "," as seperator instead of just [^"]. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:

^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$

What's the difference:

  • I'd say the [^"] syntax is "old school". The parser is consuming just everything until an " is found.
  • Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single " but no "," as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)

/edit & just as info: a ? makes an quantifier lazy (here: .+?: "Consume lazy at least one character").

View solution in original post

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!