Solved: Can you help me with the following issue involving...

kulick · ‎11-12-2018

I like and need mvexpand to work with some of my data.

Sometimes, our input events contain information about multiple, underlying events (esp. rich JSON data sources). I understand that mvexpand can, under certain situations, can lead to scaling challenges with SPL. I generally think of these problematic cases as examples where each individual input event expands into lots (hundreds, thousands or more) of newevents. I can imagine this being especially tricky when the arity of the expansion varies greatly from input event to input event.

I want to believe that cases where mvexpand causes the event count to be doubled should be safe. It seems that these cases could be implemented to be fully streamable (at the indexers) and that the SPL should scale out embarrassingly easily. Here's an example query:

| makeresults count=10000 | streamstats count | eval count=1000*round((count-1)/1000-0.5,0)
| eval mcount=mvrange(0,99,10) | mvexpand mcount | fields count mcount | fields - _raw
| eval ucount=mvrange(0,49,10) | mvexpand ucount | fields count mcount ucount | fields - _raw
| eventstats count as total by count | eventstats count as mtotal by mcount | eventstats count as utotal by ucount
| stats count, values(eval(count." (".total.")")) as cvalues,
               values(eval(mcount." (".mtotal.")")) as mvalues,
               values(eval(ucount." (".utotal.")")) as uvalues

This SPL makes 10,000 events and then mvexpands twice, once by 10x and once by 5x. The result is 500,000 events as expected. By tweaking the makeresults and mvrange commands, we can test different limits of the mvexpand command.

Adjusting the ucount to mvrange(0,99,10) produces the expected 1,000,000 events. This, however, is the highest number that works as I expected. Once the total number of total events exceeds 1,000,000 events, as any mvexpand, some (undesirable) caps begin to be applied.

In my case, I need to use mvexpand with a case where the base search itself produces many tens or hundreds of millions of events. The "expansion factor", if you will, is a small, constant number (<100, likely less than 10 and can be constrained).

Here is an example where the final expansion merely doubles the event count (in a completely local way) that I believe should work...

| makeresults count=10000 | streamstats count | eval count=1000*round((count-1)/1000-0.5,0)
| eval mcount=mvrange(0,99,1) | mvexpand mcount | fields count mcount | fields - _raw
| eval ucount=mvrange(0,49,25) | mvexpand ucount | fields count mcount ucount | fields - _raw
| eventstats count as total by count | eventstats count as mtotal by mcount | eventstats count as utotal by ucount
| stats count, values(eval(count." (".total.")")) as cvalues,
               values(eval(mcount." (".mtotal.")")) as mvalues,
               values(eval(ucount." (".utotal.")")) as uvalues

Instead of 2,000,000 events, I only get 984,200 on my environment.

I am imagining building my own custom command, but I suspect that others have hit this limit. It certainly seems that mvexpand /could/ be smarter than this. Any advice?

(For the record, I have already tried the fields - _raw trick shared in other mvexpand answers.)

martin_mueller · ‎11-12-2018

Yes, mvexpand is very inefficient. You can trigger the default 500MB memory limit with | makeresults | eval foo = mvrange(0,10000) | mvexpand foo in some splunk instances, for example - 20000 simple values shouldn't need 5MB, let alone 500.

Since there's no actual question in your question I'll provide advice instead of an answer: File an ER with support.

View solution in original post

martin_mueller · ‎11-12-2018

Yes, mvexpand is very inefficient. You can trigger the default 500MB memory limit with | makeresults | eval foo = mvrange(0,10000) | mvexpand foo in some splunk instances, for example - 20000 simple values shouldn't need 5MB, let alone 500.

Since there's no actual question in your question I'll provide advice instead of an answer: File an ER with support.

martin_mueller · ‎11-16-2018

Well, in many cases you can write searches in a way that don't need mvexpand. Whether that's possible in your case or not depends on your case.

kulick · ‎08-07-2019

And in fact, Martin taught me a great trick to avoid needing mvexpand. The trick covers cases where you would ultimately just be using the field in question in a group by clause of a subsequent stats command. In this case, you can simply leave the multi-valued field multi-valued and things will "just work". Cool trick! Thanks for showing me that one, Martin!

kulick · ‎11-16-2018

Unfortunate, this behavior, but if it is the current state of the art, then ER seems the best path forward. Thanks.

Can you help me with the following issue involving the mvexpand Limit (Total Output Limit)?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!