Splunk Search

Why does rex match fail when input length exceeds a certain threshold?

yuanliu
SplunkTrust
SplunkTrust

When input length exceeds a certain threshold, it seems that some rex match will fail while others do not. Consider the following emulation:

index=none |stats count
| eval count=mvappend("short","long") | mvexpand count
| eval emul="Begin:foo}} bar "
 + if(count="short",mvjoin(mvrange(10000,15000),"-"),
 mvjoin(mvrange(10000,20000),"-"))
| rex field=emul "Begin:(?<DATA>.*)"
| rex field=emul "Begin:.+}} (?<DATA>.+)"
| eval DATA=replace(DATA,"[\d-]+","/stuff")
| fields - emul

Note:

  1. The first two lines simply produces a two-event sample.
  2. Field emul is populated using numerals between 10000 and 20000. The first event will contain a string roughly 5,000x6 byte=30,000 byte long, whereas the second event contains a string roughly 10,000x6 byte=60,000 byte long.
  3. The intent of the two cascaded rex commands is to utilize text behind double curly braces whenever possible.

Because the two events are structurally identical, I expect both to produce bar /stuff. Such is the case when the second mvrange() is up to mvrange(10000,19000), or ~9,000x6 byte=54,000 byte. Not much longer than that, output becomes

count   DATA
short   bar /stuff
long    foo}} bar /stuff

In other words, the second rex fails to take effect. (The last two commands simply shorten output and have no effect on whether the second rex fails or not.) I see no error in job inspector and such. In my real-world search, complicated subsequent commands, including rex, do not appear affected, even after the equivalent of "Begin:.+}} (?.+)" fails.

Is this a bug or is there some parameter I need to tweak? What is special about "Begin:.+}} (?.+)"?

1 Solution

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

yuanliu
SplunkTrust
SplunkTrust

If the first rex is removed, the second rex indeed extracts no data with the long string.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...