Hi.
I've been trying to figure this out for a while now but no luck. Maybe someone has done and/or seen something similar?
datetime | cust | prod |
1:00 PM | A | 100 |
1:00 PM | A | 200 |
1:00 PM | A | 300 |
1:00 PM | B | 100 |
1:00 PM | C | 100 |
2:00 PM | A | 100 |
2:00 PM | A | 200 |
2:00 PM | A | 300 |
3:00 PM | D | 200 |
The above is an example of my splunk data source.
First item i had to count where the distinct users. I did that. But I also have to create single value charts for the following combination counts:
Distinct Count of CUST where PROD = 100
Distinct Count of CUST where PROD = 200
Distinct Count of CUST where PROD = 300
Again, these wasn't hard. I just added a where statement at the top of the Splunk query:
But here's where I'm stuck.
I also need to create some dashboard single count charts that would have this type of combinations of the PROD categories:
Distinct Count of CUST where PROD = 100 & PROD = 200
Distinct Count of CUST where PROD = 100 & PROD = 300
Distinct Count of CUST where PROD = 200 & PROD = 300
I've tried using mvcombine then trying to create a filter to find the above combinations, but after several hours & googling, no luck.
Thanks in advance!!
Hi @AdrianH,
If you'd like to generate distinct counts for an arbitrary number (n) of combinations (2^n-1), you can generate those combinations from a base search and, for example, map the combinations to a subsearch to generate distinct counts. The combinations could also be used to populate a dashboard input field.
I've introduced a bitwise AND macro to help with identifying combinations:
[bitand_32(2)]
args = x, y
definition = sum(1 * (floor($x$ / 1) % 2) * (floor($y$ / 1) % 2), 2 * (floor($x$ / 2) % 2) * (floor($y$ / 2) % 2), 4 * (floor($x$ /
4) % 2) * (floor($y$ / 4) % 2), 8 * (floor($x$ / 😎 % 2) * (floor($y$ / 😎 % 2), 16 * (floor($x$ / 16) % 2) * (floor($y$ / 16) % 2),
32 * (floor($x$ / 32) % 2) * (floor($y$ / 32) % 2), 64 * (floor($x$ / 64) % 2) * (floor($y$ / 64) % 2), 128 * (floor($x$ / 128) % 2
) * (floor($y$ / 128) % 2), 256 * (floor($x$ / 256) % 2) * (floor($y$ / 256) % 2), 512 * (floor($x$ / 512) % 2) * (floor($y$ / 512)
% 2), 1024 * (floor($x$ / 1024) % 2) * (floor($y$ / 1024) % 2), 2048 * (floor($x$ / 2048) % 2) * (floor($y$ / 2048) % 2), 4096 * (fl
oor($x$ / 4096) % 2) * (floor($y$ / 4096) % 2), 8192 * (floor($x$ / 8192) % 2) * (floor($y$ / 8192) % 2), 16384 * (floor($x$ / 16384
) % 2) * (floor($y$ / 16384) % 2), 32768 * (floor($x$ / 32768) % 2) * (floor($y$ / 32768) % 2), 65536 * (floor($x$ / 65536) % 2) * (
floor($y$ / 65536) % 2), 131072 * (floor($x$ / 131072) % 2) * (floor($y$ / 131072) % 2), 262144 * (floor($x$ / 262144) % 2) * (floor
($y$ / 262144) % 2), 524288 * (floor($x$ / 524288) % 2) * (floor($y$ / 524288) % 2), 1048576 * (floor($x$ / 1048576) % 2) * (floor($
y$ / 1048576) % 2), 2097152 * (floor($x$ / 2097152) % 2) * (floor($y$ / 2097152) % 2), 4194304 * (floor($x$ / 4194304) % 2) * (floor
($y$ / 4194304) % 2), 8388608 * (floor($x$ / 8388608) % 2) * (floor($y$ / 8388608) % 2), 16777216 * (floor($x$ / 16777216) % 2) * (f
loor($y$ / 16777216) % 2), 33554432 * (floor($x$ / 33554432) % 2) * (floor($y$ / 33554432) % 2), 67108864 * (floor($x$ / 67108864) %
2) * (floor($y$ / 67108864) % 2), 134217728 * (floor($x$ / 134217728) % 2) * (floor($y$ / 134217728) % 2), 268435456 * (floor($x$ /
268435456) % 2) * (floor($y$ / 268435456) % 2), 536870912 * (floor($x$ / 536870912) % 2) * (floor($y$ / 536870912) % 2), 1073741824
* (floor($x$ / 1073741824) % 2) * (floor($y$ / 1073741824) % 2), 2147483648 * (floor($x$ / 2147483648) % 2) * (floor($y$ / 21474836
48) % 2))
iseval = 0
With the macro in hand, we can generate a table of possible combinations and then use the table values as indices into an array of unique prod values to generate combinations:
| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod
prod
----
(prod="100")
(prod="200")
(prod="100" OR prod="200")
(prod="300")
(prod="100" OR prod="300")
(prod="200" OR prod="300")
(prod="100" OR prod="200" OR prod="300")
We can use the map command to pass the prod field to an arbitrary number of subsearches to count distinct values:
| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats values(prod) as prod
| eval x=mvrange(1, pow(2, mvcount(prod)))
| eval i=mvrange(0, ceiling(log(mvcount(x), 2)))
| mvexpand i
| eval i_{i}=pow(2, i)
| fields - i
| stats values(*) as *
| mvexpand x
| foreach i_* [ eval y=mvappend(y, case(`bitand_32(x, <<FIELD>>)`==<<FIELD>>, mvindex(prod, log(<<FIELD>>, 2)))) ]
| eval prod="(".mvjoin(mvmap(y, "prod=\"".y."\""), " OR ").")"
| fields prod
| map search="| makeresults format=csv data=\"datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200\"
| eval filter=$prod$, match=case(searchmatch(\"$prod$\"), 1)
| stats dc(eval(case(match==1, cust))) as cust_distinct_count by filter" maxsearches=10000
filter cust_distinct_count
---------------------------------------- -------------------
(prod="100") 3
(prod="200") 2
(prod="100" OR prod="200") 4
(prod="300") 1
(prod="100" OR prod="300") 3
(prod="200" OR prod="300") 2
(prod="100" OR prod="200" OR prod="300") 4
Note that the map command generates one search per filter value, and scalability is a concern. The maxsearches argument should be a number greater than or equal to 2^n-1. I've used 10000, which would accommodate n=13 products (2^13-1 = 8191). I'm assuming your actual number of products is much higher.
The search that generates combinations can be used on its own, however, and you can dispatch subsequent searches in whatever way makes sense for your dashboard.
If you're alternatively looking for a simple, more direct solution, you can combine stats dc() with eval in any search:
| makeresults format=csv data="datetime,cust,prod
1:00 PM,A,100
1:00 PM,A,200
1:00 PM,A,300
1:00 PM,B,100
1:00 PM,C,100
2:00 PM,A,100
2:00 PM,A,200
2:00 PM,A,300
3:00 PM,D,200"
| stats dc(eval(case(prod=="100" OR prod=="200", cust))) as distinct_count_of_cust_where_prod_in_100_200
distinct_count_of_cust_where_prod_in_100_200
--------------------------------------------
4
So first I think it makes sense to do a stats aggregation for all the values of the prod field for each cust value.
<base_search>
| stats
values(prod) as all_prod
by cust
This will leave us with a multivalue field look something like this.
From here you can do evaluations against the multivalue fields to check for specific conditions. Example:
The unique combination of PROD values you mentioned in the original post can be done in an eval like this.
<base_search>
| stats
values(prod) as all_prod
by cust
``` subset inclusion ```
| eval
scenario_1=mvappend(
case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
)
``` direct match ```
| eval
scenario_2=mvappend(
case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
)
Notice the 2 different scenarios here, I wasn't exactly sure if when you mentioned that a cust has 100 and 200, if that means 100 and 200 only and no other values or if the 100 and 200 values is allowed to be a subset of all that custs values. So I included both scenarios here to show the output.
Now, to get a distinct count of 'custs' that fall into each category you would just do a simple stats to tally them up by your specific scenario. Something like this.
<base_search>
| stats
values(prod) as all_prod
by cust
``` subset inclusion ```
| eval
scenario_1=mvappend(
case('all_prod'=="100" AND 'all_prod'=="200", "PROD=100 & PROD=200"),
case('all_prod'=="100" AND 'all_prod'=="300", "PROD=100 & PROD=300"),
case('all_prod'=="200" AND 'all_prod'=="300", "PROD=200 & PROD=300")
)
``` direct match ```
| eval
scenario_2=mvappend(
case('all_prod'=="100" AND 'all_prod'=="200" AND mvcount(all_prod)==2, "PROD=100 & PROD=200"),
case('all_prod'=="100" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=100 & PROD=300"),
case('all_prod'=="200" AND 'all_prod'=="300" AND mvcount(all_prod)==2, "PROD=200 & PROD=300")
)
| stats
values(cust) as custs
dc(cust) as dc_cust
by scenario_1
and the output should have your distinct counts of custs for each PROD MV combos defined.