- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Delimited field extracts always result in rex errors
I recently started trying to set up some field extracts for a few of our events. In this case, the logs are pipe delimited and contain only a few segments. What I've found that most of these attempts result in an error with rex regarding limits in limits.conf.
For example: this record:
2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###
When I attempt to use a pipe delimited field extract (for testing) the result is this error:
When I toss this regex (from the error) into regex101 (https://regex101.com/r/IswlNh/1) it tells me it requires 2473 steps, which is well above the default 1000 for depth_limit... How is it that an event with 4 segments delimited by pipe is so bad?
I realize there are 2 limits (depth_count/match_count) in play here and I can increase them, but nowhere can I find recommended values to use as a sanity check. I also realize I can optimize the regex, but as I am setting this up via UI using the delimited option, I don't have access to the regex at creation time. Not to mention, many of my users are using this option as they are not regex gurus...
So my big challenge/question is... Where do I go from here? My users are going to use this delimited options, which evidently generates some seriously inefficient regex under the covers. Do I increase my limit(s), and if so what is a sane/safe value? Is there something I'm missing?
Thanks!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Can you use the fact that pipes are the delimiter character?
^(?P<field1>[^\|]+)\s\|(?P<field2>[^\|]+)\|(?P<field3>.*)
https://regex101.com/r/MLYmkL/1
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No doubt the regex can be improved significantly as you demonstrated. I guess my challenge is, how do I tell my users that the OOB delimited option just doesn't work and that they now have to go learn regex to extract their fields?
At the end of the day, I see 3 possibilities...
- I'm doing something wrong...
- The default limits are just too low and I should increase them (to what?)...
- Splunk's delimited parsing UI just generates really inefficient regex..
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you don't want to use rex, you could use makemv
| makeresults
| eval _raw="2022-02-03 11:45:21,732 |xxxxxxxxxxxxxxx.xxxxxx.com~220130042312|<== conn[SSL/TLS]=274107 op=26810 MsgID=26810 SearchResult {resultCode=0, matchedDN=null, errorMessage=null} ### nEntries=1 ### etime=3 ###"
| makemv _raw delim="|"
| eval field1=mvindex(_raw,0)
| eval field2=mvindex(_raw,1)
| eval field3=mvindex(_raw,2)
I think the issue with your rex is there are a few greedy matches so it keeps restarting the matches hence the high number of steps.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The point here being, that's not my regex. That was generated by the Splunk UI when I tried to create a field extract using 'delimiting with pipe'... The only reason I have that regex in hand is because the error message included it...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

OK, I see. Sometimes, splunk is too clever for its own good! 😀
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yea. I feel like it must have something to do with how the UI handles these. It seems like it's using regex and it's a bit overzealous on that regex. Configuring the same delimited (as delimited) from a back-end perspective works fine without issues...
