I'm looking for a way to numerically sort a multivalue field without expanding the field, sorting and then recombining. Essentially something like the following with a dedup. Before I post an idea I wanted to see if anyone has any ideas on how to accomplish this.
Essentially I'm trying to find a way to run conversion functions from here against a multivalue field, ideally on the same eval.
| makeresults
| eval line_list=split("4,3,2,1,7,2,21,9,12,19,7",",")
| eval line_list=mvsort(mvdedup(tonumber(line_list)))
Instead I have to do this:
| makeresults
| eval line_list=split("4,3,2,1,7,2,21,9,12,19,7",",")
| mvexpand line_list
| eval line_list=tonumber(line_list)
| sort line_list
| dedup line_list
| mvcombine line_list
I think you already have a good solution.
Just optimized the same:
| makeresults
| eval line_list=split("4,3,2,1,7,2,21,9,12,19,7",",")
| mvexpand line_list
| dedup line_list sortby line_list
| mvcombine line_list
| table line_list
You have to understand the behavior of a multivalue field; the MV field itself is essentially always treated as an array of strings: ["val1", "val2", "val3"] so even in the example you noted where you mvexpand it out, running mvsort on the final product after you condense it again will sort it back to lexographic ordering.
Given that, you have to build a solution with lexographic sorting in mind.
First, I want to note that the more densely packed you get with nested evals, the more likely it is to break over time as the underlying search core engine may change. I've seen this happen, so better to split it apart into multiple evals than aim to run it all at once, since it allows Splunk to handle memory transition better that way anyways.
That said, here's your solution:
| makeresults | fields - _time
| eval number_list = split("4,3,2,1,7,2,21,9,12,19,7",",")
| eval number_list_ordered = mvmap( mvsort(mvmap(mvdedup(number_list), len(number_list) . "-" . number_list)), substr(number_list, 3) )
This will work for up to 9 character length values in a MV, and uses TWO cascading mvmap() functions to attach a number representing the length of the given value to the value itself, which makes the lexographic sorting work as we want it to.To illustrate the steps:
| makeresults | fields - _time
| eval number_list = split("4,3,2,1,7,2,21,9,12,19,7",",")
| eval number_list_2 = mvdedup(number_list)
| eval number_list_3 = mvmap(number_list_2, len(number_list_2) . "-" . number_list_2)
| eval number_list_4 = mvsort(number_list_3)
| eval number_list_final = mvmap(number_list_4, substr(number_list_4, 3))
Make sure to upvote.
You could left-space-pad before sorting, then join and split to remove the space (if required) - you still end up with strings rather than numbers but it might be sufficient for you - note that I have assumed a maximum of 3 digits so you may need to adjust for your case.
| makeresults
| eval line_list=split("4,3,2,1,7,2,21,9,123,12,19,7",",")
| eval line_list=split(mvjoin(mvsort(mvmap(line_list,printf("%*s%d",3-(floor(log(line_list,10)))," ",tonumber(line_list))))," ")," ")
Inspired by family @woodcock - check out their YouTube post https://www.youtube.com/watch?v=d0XtQcPa5zw