Searching across multivalues

AdrianH · ‎11-09-2021

Hi.

I've been trying to figure this out for a while now but no luck. Maybe someone has done and/or seen something similar?

datetime	cust	prod
1:00 PM	A	100
1:00 PM	A	200
1:00 PM	A	300
1:00 PM	B	100
1:00 PM	C	100
2:00 PM	A	100
2:00 PM	A	200
2:00 PM	A	300
3:00 PM	D	200

The above is an example of my splunk data source.

First item i had to count where the distinct users. I did that. But I also have to create single value charts for the following combination counts:

Distinct Count of CUST where PROD = 100

Distinct Count of CUST where PROD = 200

Distinct Count of CUST where PROD = 300

Again, these wasn't hard. I just added a where statement at the top of the Splunk query:

But here's where I'm stuck.

I also need to create some dashboard single count charts that would have this type of combinations of the PROD categories:

Distinct Count of CUST where PROD = 100 & PROD = 200

Distinct Count of CUST where PROD = 100 & PROD = 300

Distinct Count of CUST where PROD = 200 & PROD = 300

I've tried using mvcombine then trying to create a filter to find the above combinations, but after several hours & googling, no luck.

Thanks in advance!!

tscroggins · ‎01-05-2024

Hi @AdrianH,

If you'd like to generate distinct counts for an arbitrary number (n) of combinations (2^n-1), you can generate those combinations from a base search and, for example, map the combinations to a subsearch to generate distinct counts. The combinations could also be used to populate a dashboard input field.

I've introduced a bitwise AND macro to help with identifying combinations:

[bitand_32(2)]
args = x, y
definition = sum(1 * (floor($x$ / 1) % 2) * (floor($y$ / 1) % 2), 2 * (floor($x$ / 2) % 2) * (floor($y$ / 2) % 2), 4 * (floor($x$ /
4) % 2) * (floor($y$ / 4) % 2), 8 * (floor($x$ / 😎 % 2) * (floor($y$ / 😎 % 2), 16 * (floor($x$ / 16) % 2) * (floor($y$ / 16) % 2),
 32 * (floor($x$ / 32) % 2) * (floor($y$ / 32) % 2), 64 * (floor($x$ / 64) % 2) * (floor($y$ / 64) % 2), 128 * (floor($x$ / 128) % 2
) * (floor($y$ / 128) % 2), 256 * (floor($x$ / 256) % 2) * (floor($y$ / 256) % 2), 512 * (floor($x$ / 512) % 2) * (floor($y$ / 512)
% 2), 1024 * (floor($x$ / 1024) % 2) * (floor($y$ / 1024) % 2), 2048 * (floor($x$ / 2048) % 2) * (floor($y$ / 2048) % 2), 4096 * (fl
oor($x$ / 4096) % 2) * (floor($y$ / 4096) % 2), 8192 * (floor($x$ / 8192) % 2) * (floor($y$ / 8192) % 2), 16384 * (floor($x$ / 16384
) % 2) * (floor($y$ / 16384) % 2), 32768 * (floor($x$ / 32768) % 2) * (floor($y$ / 32768) % 2), 65536 * (floor($x$ / 65536) % 2) * (
floor($y$ / 65536) % 2), 131072 * (floor($x$ / 131072) % 2) * (floor($y$ / 131072) % 2), 262144 * (floor($x$ / 262144) % 2) * (floor
($y$ / 262144) % 2), 524288 * (floor($x$ / 524288) % 2) * (floor($y$ / 524288) % 2), 1048576 * (floor($x$ / 1048576) % 2) * (floor($
y$ / 1048576) % 2), 2097152 * (floor($x$ / 2097152) % 2) * (floor($y$ / 2097152) % 2), 4194304 * (floor($x$ / 4194304) % 2) * (floor
($y$ / 4194304) % 2), 8388608 * (floor($x$ / 8388608) % 2) * (floor($y$ / 8388608) % 2), 16777216 * (floor($x$ / 16777216) % 2) * (f
loor($y$ / 16777216) % 2), 33554432 * (floor($x$ / 33554432) % 2) * (floor($y$ / 33554432) % 2), 67108864 * (floor($x$ / 67108864) %
 2) * (floor($y$ / 67108864) % 2), 134217728 * (floor($x$ / 134217728) % 2) * (floor($y$ / 134217728) % 2), 268435456 * (floor($x$ /
 268435456) % 2) * (floor($y$ / 268435456) % 2), 536870912 * (floor($x$ / 536870912) % 2) * (floor($y$ / 536870912) % 2), 1073741824
 * (floor($x$ / 1073741824) % 2) * (floor($y$ / 1073741824) % 2), 2147483648 * (floor($x$ / 2147483648) % 2) * (floor($y$ / 21474836
48) % 2))
iseval = 0

With the macro in hand, we can generate a table of possible combinations and then use the table values as indices into an array of unique prod values to generate combinations:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod

prod
----
(prod="100")
(prod="200")
(prod="100" OR prod="200")
(prod="300")
(prod="100" OR prod="300")
(prod="200" OR prod="300")
(prod="100" OR prod="200" OR prod="300")

We can use the map command to pass the prod field to an arbitrary number of subsearches to count distinct values:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod
| map search="| makeresults format=csv data=\"datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200\"
| eval filter=$prod$, match=case(searchmatch(\"$prod$\"), 1)
| stats dc(eval(case(match==1, cust))) as cust_distinct_count by filter" maxsearches=10000

filter                                   cust_distinct_count
---------------------------------------- -------------------
(prod="100")                                               3
(prod="200")                                               2
(prod="100" OR prod="200")                                 4
(prod="300")                                               1
(prod="100" OR prod="300")                                 3
(prod="200" OR prod="300")                                 2
(prod="100" OR prod="200" OR prod="300")                   4

Note that the map command generates one search per filter value, and scalability is a concern. The maxsearches argument should be a number greater than or equal to 2^n-1. I've used 10000, which would accommodate n=13 products (2^13-1 = 8191). I'm assuming your actual number of products is much higher.

The search that generates combinations can be used on its own, however, and you can dispatch subsequent searches in whatever way makes sense for your dashboard.

tscroggins · ‎01-05-2024

If you're alternatively looking for a simple, more direct solution, you can combine stats dc() with eval in any search:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats dc(eval(case(prod=="100" OR prod=="200", cust))) as distinct_count_of_cust_where_prod_in_100_200

distinct_count_of_cust_where_prod_in_100_200
--------------------------------------------
                                           4

dtburrows3 · ‎01-01-2024

So first I think it makes sense to do a stats aggregation for all the values of the prod field for each cust value.

<base_search>
    | stats
        values(prod) as all_prod
            by cust

This will leave us with a multivalue field look something like this.

From here you can do evaluations against the multivalue fields to check for specific conditions. Example:
The unique combination of PROD values you mentioned in the original post can be done in an eval like this.

<base_search>
    | stats
        values(prod) as all_prod
            by cust
    ``` subset inclusion ```
    | eval
        scenario_1=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
            )  
    ``` direct match ```
    | eval
        scenario_2=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
            )

Notice the 2 different scenarios here, I wasn't exactly sure if when you mentioned that a cust has 100 and 200, if that means 100 and 200 only and no other values or if the 100 and 200 values is allowed to be a subset of all that custs values. So I included both scenarios here to show the output.

Now, to get a distinct count of 'custs' that fall into each category you would just do a simple stats to tally them up by your specific scenario. Something like this.

<base_search>
    | stats
        values(prod) as all_prod
            by cust
    ``` subset inclusion ```
    | eval
        scenario_1=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
            )  
    ``` direct match ```
    | eval
        scenario_2=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
            )  
    | stats
        values(cust) as custs
        dc(cust) as dc_cust
            by scenario_1

and the output should have your distinct counts of custs for each PROD MV combos defined.

Searching across multivalues

chart

count

stats

Observe and Secure All Apps with Splunk

What's New in Splunk Observability - August 2025

Introduction to Splunk AI

Are you a member of the Splunk Community?

Searching across multivalues

chart

count

stats

Observe and Secure All Apps with Splunk

What's New in Splunk Observability - August 2025

Introduction to Splunk AI