Why does rex match fail when input length exceeds a certain threshold?

yuanliu — Wed, 08 Oct 2014 23:28:50 GMT

When input length exceeds a certain threshold, it seems that some rex match will fail while others do not. Consider the following emulation:

index=none |stats count
| eval count=mvappend("short","long") | mvexpand count
| eval emul="Begin:foo}} bar "
 + if(count="short",mvjoin(mvrange(10000,15000),"-"),
 mvjoin(mvrange(10000,20000),"-"))
| rex field=emul "Begin:(?<DATA>.*)"
| rex field=emul "Begin:.+}} (?<DATA>.+)"
| eval DATA=replace(DATA,"[\d-]+","/stuff")
| fields - emul

Note:

The first two lines simply produces a two-event sample.
Field emul is populated using numerals between 10000 and 20000. The first event will contain a string roughly 5,000x6 byte=30,000 byte long, whereas the second event contains a string roughly 10,000x6 byte=60,000 byte long.
The intent of the two cascaded rex commands is to utilize text behind double curly braces whenever possible.

Because the two events are structurally identical, I expect both to produce bar /stuff. Such is the case when the second mvrange() is up to mvrange(10000,19000), or ~9,000x6 byte=54,000 byte. Not much longer than that, output becomes

count   DATA
short   bar /stuff
long    foo}} bar /stuff

In other words, the second rex fails to take effect. (The last two commands simply shorten output and have no effect on whether the second rex fails or not.) I see no error in job inspector and such. In my real-world search, complicated subsequent commands, including rex, do not appear affected, even after the equivalent of "Begin:.+}} (?.+)" fails.

Is this a bug or is there some parameter I need to tweak? What is special about "Begin:.+}} (?.+)"?

Re: Why does rex match fail when input length exceeds a certain threshold?

yuanliu — Wed, 08 Oct 2014 23:39:20 GMT

If the first rex is removed, the second rex indeed extracts no data with the long string.

Re: Why does rex match fail when input length exceeds a certain threshold?

yuanliu — Fri, 10 Oct 2014 19:17:55 GMT

So it is unrelated to curly bracket, but triggered by left-aggressiveness before field extraction. If I add left-aggressiveness to the first rex, the first rex will fail the long string, too. E.g.,

 | rex field=emul "Begi.+:(?<DATA>.*)"
 | rex field=emul "Begin:.+}} (?<DATA>.+)"

Output becomes

count   DATA
short   bar /stuff
long

topic Re: Why does rex match fail when input length exceeds a certain threshold? in Splunk Search

Why does rex match fail when input length exceeds a certain threshold?

Re: Why does rex match fail when input length exceeds a certain threshold?

Re: Why does rex match fail when input length exceeds a certain threshold?