I have a dataset that looks like: (id, foo, bar, user) that I want to show results for on a dashboard.
Given an input combination of values for foo and bar, I want to know which ids both
a) have at least one row that has BOTH of those values; and
b) have at least one row that has NEITHER of those values
and then count the number of such ids by user.
For example, a search on (foo=A, bar=1) for the data
id | foo | bar | user |
1234 | A | 1 | admin |
1234 | B | 2 | admin |
1234 | C | 3 | other_user |
abcd | A | 1 | admin |
abcd | A | 2 | admin |
would count 1234, but not abcd, and return
user | ids |
admin | 1 |
other_user | 1 |
Each search parameter can be a single value or a comma-separated list. Empty values are permitted in up to one field at a time.
This is the closest I have been able to get:
index="data" [
| tstats count where index="data" AND foo IN (A) AND bar IN (1) by id
| fields id
] AND NOT (foo IN (A) OR bar IN (1))
| fields id, user
| stats dc(id) as ids by user
I believe the query does what I want it to, but unfortunately am constrained by the hard limit of 10,500 results for subsearches. Is there a way to get the data I want without an intermediate command limiting my results?
Ok. That's a bit clearer now. Still, the solution will be similar if I understood you correctly this time
index=whatever
| eval matchstate=case(in(foo,1,2,3) AND in(bar,4,5,6),1,NOT in(foo,1,2,3) AND NOT in(foo,4,5,6),2,1=1,0)
| stats values(matchstate) as matchstate values(user) as users by id
| where matchstate=1 AND matchstate=2
| fields - matchstate
| mvexpand user
| state count by user
index=whatever
| eval matchstate=case(in(foo,1,2,3) AND in(bar,4,5,6),1,NOT in(foo,1,2,3) AND NOT in(foo,4,5,6),2,1=1,0)
| stats values(matchstate) as matchstate by id user
| where matchstate=1 AND matchstate=2
| state count by user
I probably edited my original post after this response, but doing
| stats ... by id user
won't work.
I tried
| eventstats values(matchstate) as matchstate by id
but similarly to the other proposed solution, the job autofinalizes for a relatively small amount of data and ultimately returns no results.
What do you mean by "won't work"?
Also, stats and eventstats are not the same.
Correct me if I'm wrong - using multiple in fields in by counts by unique value tuples of all those fields right?
So, given the sample table I posted:
| eval matchstate=case(in(foo,1,2,3) AND in(bar,4,5,6),1,NOT in(foo,1,2,3) AND NOT in(foo,4,5,6),2,1=1,0
| stats values(matchstate) as matchstate by id user
will give back
id | foo | bar | user | matchstate |
1234 | A | 1 | admin | 1, 2 |
1234 | C | 3 | other_user | 2 |
abcd | A | 1 | admin | 1, 0 |
Then
| where matchstate=1 AND matchstate=2
will drop the row for other_user, when I want my results to include that row.
I mentioned eventstats because I would still need the user column after doing the stats values, but can't include it using by.
The other_user doesn't match your specs.
Let me quote you. For each user you wanted
"which ids both
a) have at least one row that has BOTH of those values; and
b) have at least one row that has NEITHER of those values"
Ah sorry if I was unclear, I know this problem is a little convoluted.
I want to find the ids that fulfil the conditions *regardless of user*, THEN count the number of such ids by user.
VS count the ids that fulfil the conditions for a single user.
For example, if you removed the second row of my test data, the result would be the same (returning both admin and other_user).
Ok. That's a bit clearer now. Still, the solution will be similar if I understood you correctly this time
index=whatever
| eval matchstate=case(in(foo,1,2,3) AND in(bar,4,5,6),1,NOT in(foo,1,2,3) AND NOT in(foo,4,5,6),2,1=1,0)
| stats values(matchstate) as matchstate values(user) as users by id
| where matchstate=1 AND matchstate=2
| fields - matchstate
| mvexpand user
| state count by user
Give this a shot:
index="data"
| eval match_ct=IF(foo="A", 1, 0)
| eval match_ct=IF(bar="1", match_ct+1, match_ct)
| search match_ct IN (0, 2)
| stats dc(id) AS ids by user
This is close, but I only want the ids that have rows where match_ct=0 AND match_ct=2.
A thought would be to do
| stats dc(match_ct) as matches by id
| search matches=2
But that loses user info. I updated the original post to show why
| stats dc(match_ct) as matches by id, user
also wouldn't work.
| eventstats dc(match_ct) as dc_matches by id
| search dc_matches=2
I think this does what I'm looking for, but the job autofinalizes for even a relatively small amount of data.