I recently started trying to set up some field extracts for a few of our events. In this case, the logs are pipe delimited and contain only a few segments. What I've found that most of these attempts result in an error with rex regarding limits in limits.conf.
For example: this record:
2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###
When I attempt to use a pipe delimited field extract (for testing) the result is this error:
When I toss this regex (from the error) into regex101 (https://regex101.com/r/IswlNh/1) it tells me it requires 2473 steps, which is well above the default 1000 for depth_limit... How is it that an event with 4 segments delimited by pipe is so bad?
I realize there are 2 limits (depth_count/match_count) in play here and I can increase them, but nowhere can I find recommended values to use as a sanity check. I also realize I can optimize the regex, but as I am setting this up via UI using the delimited option, I don't have access to the regex at creation time. Not to mention, many of my users are using this option as they are not regex gurus...
So my big challenge/question is... Where do I go from here? My users are going to use this delimited options, which evidently generates some seriously inefficient regex under the covers. Do I increase my limit(s), and if so what is a sane/safe value? Is there something I'm missing?
Thanks!
Can you use the fact that pipes are the delimiter character?
^(?P<field1>[^\|]+)\s\|(?P<field2>[^\|]+)\|(?P<field3>.*)
https://regex101.com/r/MLYmkL/1
No doubt the regex can be improved significantly as you demonstrated. I guess my challenge is, how do I tell my users that the OOB delimited option just doesn't work and that they now have to go learn regex to extract their fields?
At the end of the day, I see 3 possibilities...
If you don't want to use rex, you could use makemv
| makeresults
| eval _raw="2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###"
| makemv _raw delim="|"
| eval field1=mvindex(_raw,0)
| eval field2=mvindex(_raw,1)
| eval field3=mvindex(_raw,2)
I think the issue with your rex is there are a few greedy matches so it keeps restarting the matches hence the high number of steps.
The point here being, that's not my regex. That was generated by the Splunk UI when I tried to create a field extract using 'delimiting with pipe'... The only reason I have that regex in hand is because the error message included it...
OK, I see. Sometimes, splunk is too clever for its own good! 😀
Yea. I feel like it must have something to do with how the UI handles these. It seems like it's using regex and it's a bit overzealous on that regex. Configuring the same delimited (as delimited) from a back-end perspective works fine without issues...