Splunk Search

Why does rex match fail when input length exceeds a certain threshold?

yuanliu
SplunkTrust
SplunkTrust

When input length exceeds a certain threshold, it seems that some rex match will fail while others do not. Consider the following emulation:

index=none |stats count
| eval count=mvappend("short","long") | mvexpand count
| eval emul="Begin:foo}} bar "
 + if(count="short",mvjoin(mvrange(10000,15000),"-"),
 mvjoin(mvrange(10000,20000),"-"))
| rex field=emul "Begin:(?<DATA>.*)"
| rex field=emul "Begin:.+}} (?<DATA>.+)"
| eval DATA=replace(DATA,"[\d-]+","/stuff")
| fields - emul

Note:

  1. The first two lines simply produces a two-event sample.
  2. Field emul is populated using numerals between 10000 and 20000. The first event will contain a string roughly 5,000x6 byte=30,000 byte long, whereas the second event contains a string roughly 10,000x6 byte=60,000 byte long.
  3. The intent of the two cascaded rex commands is to utilize text behind double curly braces whenever possible.

Because the two events are structurally identical, I expect both to produce bar /stuff. Such is the case when the second mvrange() is up to mvrange(10000,19000), or ~9,000x6 byte=54,000 byte. Not much longer than that, output becomes

count   DATA
short   bar /stuff
long    foo}} bar /stuff

In other words, the second rex fails to take effect. (The last two commands simply shorten output and have no effect on whether the second rex fails or not.) I see no error in job inspector and such. In my real-world search, complicated subsequent commands, including rex, do not appear affected, even after the equivalent of "Begin:.+}} (?.+)" fails.

Is this a bug or is there some parameter I need to tweak? What is special about "Begin:.+}} (?.+)"?

1 Solution

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long     

yuanliu
SplunkTrust
SplunkTrust

If the first rex is removed, the second rex indeed extracts no data with the long string.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...