Delimited field extracts always result in rex erro...

JosephHobbs · ‎02-03-2022

I recently started trying to set up some field extracts for a few of our events. In this case, the logs are pipe delimited and contain only a few segments. What I've found that most of these attempts result in an error with rex regarding limits in limits.conf.

For example: this record:

2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###

When I attempt to use a pipe delimited field extract (for testing) the result is this error:

When I toss this regex (from the error) into regex101 (https://regex101.com/r/IswlNh/1) it tells me it requires 2473 steps, which is well above the default 1000 for depth_limit... How is it that an event with 4 segments delimited by pipe is so bad?

I realize there are 2 limits (depth_count/match_count) in play here and I can increase them, but nowhere can I find recommended values to use as a sanity check. I also realize I can optimize the regex, but as I am setting this up via UI using the delimited option, I don't have access to the regex at creation time. Not to mention, many of my users are using this option as they are not regex gurus...

So my big challenge/question is... Where do I go from here? My users are going to use this delimited options, which evidently generates some seriously inefficient regex under the covers. Do I increase my limit(s), and if so what is a sane/safe value? Is there something I'm missing?

Thanks!

ITWhisperer · ‎02-03-2022

Can you use the fact that pipes are the delimiter character?

^(?P<field1>[^\|]+)\s\|(?P<field2>[^\|]+)\|(?P<field3>.*)

https://regex101.com/r/MLYmkL/1

JosephHobbs · ‎02-03-2022

No doubt the regex can be improved significantly as you demonstrated. I guess my challenge is, how do I tell my users that the OOB delimited option just doesn't work and that they now have to go learn regex to extract their fields?

At the end of the day, I see 3 possibilities...

I'm doing something wrong...
The default limits are just too low and I should increase them (to what?)...
Splunk's delimited parsing UI just generates really inefficient regex..

ITWhisperer · ‎02-03-2022

If you don't want to use rex, you could use makemv

| makeresults
| eval _raw="2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###"
| makemv _raw delim="|"
| eval field1=mvindex(_raw,0)
| eval field2=mvindex(_raw,1)
| eval field3=mvindex(_raw,2)

I think the issue with your rex is there are a few greedy matches so it keeps restarting the matches hence the high number of steps.

JosephHobbs · ‎02-03-2022

The point here being, that's not my regex. That was generated by the Splunk UI when I tried to create a field extract using 'delimiting with pipe'... The only reason I have that regex in hand is because the error message included it...

ITWhisperer · ‎02-03-2022

OK, I see. Sometimes, splunk is too clever for its own good! 😀

JosephHobbs · ‎02-07-2022

Yea. I feel like it must have something to do with how the UI handles these. It seems like it's using regex and it's a bit overzealous on that regex. Configuring the same delimited (as delimited) from a back-end perspective works fine without issues...

Delimited field extracts always result in rex errors

field extraction

fields

rex

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?