Splunk Search

Join is breaking multivalues...?

Motivator

It looks like a join will break multivalues. And I thought mvexpand couldn't get any more dangerous or misleading than it already is.

This is the query:

index=wineventlog host=MYVEEAMHOST source="WinEventLog:Veeam Backup" EventCode=190 | 

rex field=Message "(?i)^Backup job \'(?<snapshot>.+?)\'" | rex field=Message "(?i)^Backup job \'.*\' finished with (?<result>[^\.]+)" | rex field=snapshot "(?i)^(?<server>\d\S+)" | rex field=snapshot "(?i)^(?<datastore>ds-\S+)" |

join type=left datastore [search index=vmware-inv vm_name!="" datastores{}.name!="" earliest=-1d@d latest=@d | sort 0 - _time | dedup vm_name | mvexpand datastores{}.name | rename "datastores{}.name" AS datastore | stats values(vm_name) AS vms BY datastore] | 

fillnull value="" | mvexpand vms | table server datastore vms _time snapshot Message

The subsearch inside the join, copied and pasted exactly into a separate search, works correctly and returns vms as a mv field. The main search instead after the join returns vms as a single string with whitespace between the values, e.g. 1ABC-ABC-DC1 1DEF-DEF-DC1 1DEF-DEF-DC2 Why...? I am not looking for a workaround as I can obviously solve with split, I am looking for an answer.

RUN ANYWHERE CODE THAT HAS SAME ISSUE:

| makeresults | eval a="1" | join type=left a [| makeresults | eval a="1" | eval b="2" | append [| makeresults | eval a="1" | eval b="3"] | mvcombine b] | mvexpand b
0 Karma

SplunkTrust
SplunkTrust

So, first, if you want the left side of the join to join to multiple records on the right side of the join, then you have to use the max= parameter. Otherwise, the record on the left will join only to the first record that comes out of the right side.

Next, you are correct, a join will treat the multivalue records as if they are a single string.

| makeresults | eval a="1" 
| join type=left a 
    [| makeresults | eval a="1" | eval b=mvappend("2","3") | eval binnercount=mvcount(b)] 
| eval boutercount=mvcount(b) 

In general, join is one of the last join-type verbs that you go for. There are other verbs that do the same effect more efficiently. Here's an example of using stats instead.

| makeresults | eval a="1", rectype="first" 
| join type=left a 
    [| makeresults | eval a="1", rectype="second" | eval b=mvappend("2","3") | eval binnercount=mvcount(b) ] 
| eval boutercount=mvcount(b) 
| stats values(*) as * by a

Your resulting code would probably look something like this.

 index=wineventlog host=MYVEEAMHOST source="WinEventLog:Veeam Backup" EventCode=190 
| rex field=Message "(?i)^Backup job \'(?<snapshot>.+?)\'" 
| rex field=Message "(?i)^Backup job \'.*\' finished with (?<result>[^\.]+)" 
| rex field=snapshot "(?i)^(?<server>\d\S+)" 
| rex field=snapshot "(?i)^(?<datastore>ds-\S+)" 
| fields _time snapshot result server datastore Message

| append 
    [search index=vmware-inv vm_name!="" datastores{}.name!="" earliest=-1d@d latest=@d 
    | sort 0 - _time | dedup vm_name 
    | mvexpand datastores{}.name 
    | rename "datastores{}.name" AS datastore 
    | stats values(vm_name) AS vms BY datastore
    | rename comment as "mark these records so we can kill them later" 
    | eval killme="killme"] 

| rename comment as "roll the vms records over to the log messages then kill the vms records" 
| eventstats values(*) as * by datastore
| where isnull(killme)

| rename comment as "expand teh vms and present the results." 
| mvexpand vms 
| table server datastore vms _time snapshot Message