Splunk Search

Why does rex match fail when input length exceeds a certain threshold?

yuanliu
SplunkTrust
SplunkTrust

When input length exceeds a certain threshold, it seems that some rex match will fail while others do not. Consider the following emulation:

index=none |stats count
| eval count=mvappend("short","long") | mvexpand count
| eval emul="Begin:foo}} bar "
 + if(count="short",mvjoin(mvrange(10000,15000),"-"),
 mvjoin(mvrange(10000,20000),"-"))
| rex field=emul "Begin:(?<DATA>.*)"
| rex field=emul "Begin:.+}} (?<DATA>.+)"
| eval DATA=replace(DATA,"[\d-]+","/stuff")
| fields - emul

Note:

  1. The first two lines simply produces a two-event sample.
  2. Field emul is populated using numerals between 10000 and 20000. The first event will contain a string roughly 5,000x6 byte=30,000 byte long, whereas the second event contains a string roughly 10,000x6 byte=60,000 byte long.
  3. The intent of the two cascaded rex commands is to utilize text behind double curly braces whenever possible.

Because the two events are structurally identical, I expect both to produce bar /stuff. Such is the case when the second mvrange() is up to mvrange(10000,19000), or ~9,000x6 byte=54,000 byte. Not much longer than that, output becomes

count   DATA
short   bar /stuff
long    foo}} bar /stuff

In other words, the second rex fails to take effect. (The last two commands simply shorten output and have no effect on whether the second rex fails or not.) I see no error in job inspector and such. In my real-world search, complicated subsequent commands, including rex, do not appear affected, even after the equivalent of "Begin:.+}} (?.+)" fails.

Is this a bug or is there some parameter I need to tweak? What is special about "Begin:.+}} (?.+)"?

1 Solution

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

yuanliu
SplunkTrust
SplunkTrust

If the first rex is removed, the second rex indeed extracts no data with the long string.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at ...

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...