Splunk Search

Searching across multivalues

AdrianH
Explorer

Hi.

I've been trying to figure this out for a while now but no luck.  Maybe someone has done and/or seen something similar?

 

datetimecust prod
1:00 PMA100
1:00 PMA200
1:00 PMA300
1:00 PMB100
1:00 PMC100
2:00 PMA100
2:00 PMA200
2:00 PMA300
3:00 PMD200

 

The above is an example of my splunk data source.   

First item i had to count where the distinct users.   I did that.   But I also have to create single value charts for the following combination counts:

Distinct Count of CUST where PROD = 100

Distinct Count of CUST where PROD = 200

Distinct Count of CUST where PROD = 300

Again, these wasn't hard.  I just added a where statement at the top of the Splunk query:

But here's where I'm stuck.

I also need to create some dashboard single count charts that would have this type of combinations of the PROD categories:

Distinct Count of CUST where PROD = 100 & PROD = 200

Distinct Count of CUST where PROD = 100 & PROD = 300

Distinct Count of CUST where PROD = 200 & PROD = 300

I've tried using mvcombine then trying to create a filter to find the above combinations, but after several hours & googling, no luck.

Thanks in advance!!

 

Labels (3)
0 Karma

tscroggins
Influencer

Hi @AdrianH,

If you'd like to generate distinct counts for an arbitrary number (n) of combinations (2^n-1), you can generate those combinations from a base search and, for example, map the combinations to a subsearch to generate distinct counts. The combinations could also be used to populate a dashboard input field.

I've introduced a bitwise AND macro to help with identifying combinations:

[bitand_32(2)]
args = x, y
definition = sum(1 * (floor($x$ / 1) % 2) * (floor($y$ / 1) % 2), 2 * (floor($x$ / 2) % 2) * (floor($y$ / 2) % 2), 4 * (floor($x$ /
4) % 2) * (floor($y$ / 4) % 2), 8 * (floor($x$ / 😎 % 2) * (floor($y$ / 😎 % 2), 16 * (floor($x$ / 16) % 2) * (floor($y$ / 16) % 2),
 32 * (floor($x$ / 32) % 2) * (floor($y$ / 32) % 2), 64 * (floor($x$ / 64) % 2) * (floor($y$ / 64) % 2), 128 * (floor($x$ / 128) % 2
) * (floor($y$ / 128) % 2), 256 * (floor($x$ / 256) % 2) * (floor($y$ / 256) % 2), 512 * (floor($x$ / 512) % 2) * (floor($y$ / 512)
% 2), 1024 * (floor($x$ / 1024) % 2) * (floor($y$ / 1024) % 2), 2048 * (floor($x$ / 2048) % 2) * (floor($y$ / 2048) % 2), 4096 * (fl
oor($x$ / 4096) % 2) * (floor($y$ / 4096) % 2), 8192 * (floor($x$ / 8192) % 2) * (floor($y$ / 8192) % 2), 16384 * (floor($x$ / 16384
) % 2) * (floor($y$ / 16384) % 2), 32768 * (floor($x$ / 32768) % 2) * (floor($y$ / 32768) % 2), 65536 * (floor($x$ / 65536) % 2) * (
floor($y$ / 65536) % 2), 131072 * (floor($x$ / 131072) % 2) * (floor($y$ / 131072) % 2), 262144 * (floor($x$ / 262144) % 2) * (floor
($y$ / 262144) % 2), 524288 * (floor($x$ / 524288) % 2) * (floor($y$ / 524288) % 2), 1048576 * (floor($x$ / 1048576) % 2) * (floor($
y$ / 1048576) % 2), 2097152 * (floor($x$ / 2097152) % 2) * (floor($y$ / 2097152) % 2), 4194304 * (floor($x$ / 4194304) % 2) * (floor
($y$ / 4194304) % 2), 8388608 * (floor($x$ / 8388608) % 2) * (floor($y$ / 8388608) % 2), 16777216 * (floor($x$ / 16777216) % 2) * (f
loor($y$ / 16777216) % 2), 33554432 * (floor($x$ / 33554432) % 2) * (floor($y$ / 33554432) % 2), 67108864 * (floor($x$ / 67108864) %
 2) * (floor($y$ / 67108864) % 2), 134217728 * (floor($x$ / 134217728) % 2) * (floor($y$ / 134217728) % 2), 268435456 * (floor($x$ /
 268435456) % 2) * (floor($y$ / 268435456) % 2), 536870912 * (floor($x$ / 536870912) % 2) * (floor($y$ / 536870912) % 2), 1073741824
 * (floor($x$ / 1073741824) % 2) * (floor($y$ / 1073741824) % 2), 2147483648 * (floor($x$ / 2147483648) % 2) * (floor($y$ / 21474836
48) % 2))
iseval = 0

With the macro in hand, we can generate a table of possible combinations and then use the table values as indices into an array of unique prod values to generate combinations:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod
prod
----
(prod="100")
(prod="200")
(prod="100" OR prod="200")
(prod="300")
(prod="100" OR prod="300")
(prod="200" OR prod="300")
(prod="100" OR prod="200" OR prod="300")

We can use the map command to pass the prod field to an arbitrary number of subsearches to count distinct values:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod
| map search="| makeresults format=csv data=\"datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200\"
| eval filter=$prod$, match=case(searchmatch(\"$prod$\"), 1)
| stats dc(eval(case(match==1, cust))) as cust_distinct_count by filter" maxsearches=10000
filter                                   cust_distinct_count
---------------------------------------- -------------------
(prod="100")                                               3
(prod="200")                                               2
(prod="100" OR prod="200")                                 4
(prod="300")                                               1
(prod="100" OR prod="300")                                 3
(prod="200" OR prod="300")                                 2
(prod="100" OR prod="200" OR prod="300")                   4

Note that the map command generates one search per filter value, and scalability is a concern. The maxsearches argument should be a number greater than or equal to 2^n-1. I've used 10000,  which would accommodate n=13 products (2^13-1 = 8191). I'm assuming your actual number of products is much higher.

The search that generates combinations can be used on its own, however, and you can dispatch subsequent searches in whatever way makes sense for your dashboard.

tscroggins
Influencer

If you're alternatively looking for a simple, more direct solution, you can combine stats dc() with eval in any search:

| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats dc(eval(case(prod=="100" OR prod=="200", cust))) as distinct_count_of_cust_where_prod_in_100_200
distinct_count_of_cust_where_prod_in_100_200
--------------------------------------------
                                           4
0 Karma

dtburrows3
Builder

So first I think it makes sense to do a stats aggregation for all the values of the prod field for each cust value.

<base_search>
    | stats
        values(prod) as all_prod
            by cust


This will leave us with a multivalue field look something like this.

dtburrows3_0-1704128049043.png

From here you can do evaluations against the multivalue fields to check for specific conditions. Example:
The unique combination of PROD values you mentioned in the original post can be done in an eval like this.

<base_search>
    | stats
        values(prod) as all_prod
            by cust
    ``` subset inclusion ```
    | eval
        scenario_1=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
            )  
    ``` direct match ```
    | eval
        scenario_2=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
            )  

 Notice the 2 different scenarios here, I wasn't exactly sure if when you mentioned that a cust has 100 and 200, if that means 100 and 200 only and no other values or if the 100 and 200 values is allowed to be a subset of all that custs values. So I included both scenarios here to show the output.

dtburrows3_1-1704128269072.png

Now, to get a distinct count of 'custs' that fall into each category you would just do a simple stats to tally them up by your specific scenario. Something like this.

<base_search>
    | stats
        values(prod) as all_prod
            by cust
    ``` subset inclusion ```
    | eval
        scenario_1=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
            )  
    ``` direct match ```
    | eval
        scenario_2=mvappend(
            case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
            case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
            case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
            )  
    | stats
        values(cust) as custs
        dc(cust) as dc_cust
            by scenario_1

 and the output should have your distinct counts of custs for each PROD MV combos defined.

dtburrows3_2-1704128458948.png

 

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...