Splunk Search

Why does rex match fail when input length exceeds a certain threshold?

yuanliu
SplunkTrust
SplunkTrust

When input length exceeds a certain threshold, it seems that some rex match will fail while others do not. Consider the following emulation:

index=none |stats count
| eval count=mvappend("short","long") | mvexpand count
| eval emul="Begin:foo}} bar "
 + if(count="short",mvjoin(mvrange(10000,15000),"-"),
 mvjoin(mvrange(10000,20000),"-"))
| rex field=emul "Begin:(?<DATA>.*)"
| rex field=emul "Begin:.+}} (?<DATA>.+)"
| eval DATA=replace(DATA,"[\d-]+","/stuff")
| fields - emul

Note:

  1. The first two lines simply produces a two-event sample.
  2. Field emul is populated using numerals between 10000 and 20000. The first event will contain a string roughly 5,000x6 byte=30,000 byte long, whereas the second event contains a string roughly 10,000x6 byte=60,000 byte long.
  3. The intent of the two cascaded rex commands is to utilize text behind double curly braces whenever possible.

Because the two events are structurally identical, I expect both to produce bar /stuff. Such is the case when the second mvrange() is up to mvrange(10000,19000), or ~9,000x6 byte=54,000 byte. Not much longer than that, output becomes

count   DATA
short   bar /stuff
long    foo}} bar /stuff

In other words, the second rex fails to take effect. (The last two commands simply shorten output and have no effect on whether the second rex fails or not.) I see no error in job inspector and such. In my real-world search, complicated subsequent commands, including rex, do not appear affected, even after the equivalent of "Begin:.+}} (?.+)" fails.

Is this a bug or is there some parameter I need to tweak? What is special about "Begin:.+}} (?.+)"?

1 Solution

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

yuanliu
SplunkTrust
SplunkTrust

If the first rex is removed, the second rex indeed extracts no data with the long string.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...