My data looks like the following
Ideally, we shouldn't have different guids given same student_id, browser_id and x_id. I am trying to find all those student_ids who violate this rule aka student_ids with same browser_id and x_id but different guid. So for the above, I'd like to see something like -
12_a Chrome_2 1122 1/9/23 14:45 788a 12_a Chrome_2 1132 1/11/23 21:50 788a
I am trying -
index="main_idx" app="student_svc" | stats count by student_id browser_id guid datetime x_id | where count > 1 | stats list(count) by student_id
But it doesn't seem to be yielding the result. What should be the fix? Thanks
Thanks @gcusello - it seems close - one thing I want to get rid of in the search is empty guid - how to add that?
And the problem statement is for each (student_id + browser_id + x_id) there should be one guid.
please try this:
index="main_idx" app="student_svc" | fillnull guid value="empty" | stats dc(browser_id) AS browser_id_count dc(guid) AS guid_count dc(x_id) AS x_id_count dc(eval(if(guid="empty",1,0))) AS empty_guid_count BY student_id | where browser_id_count>1 OR guid_count>1 OR x_id_count>1 OR empty_guid_count>0
@gcusello thx - this is not exactly giving me the result I want though.
I want all the student_ids who has more than 1 guid for a particular browser_id and x_id combination.
I tried changing the ORs to ANDs but no luck.
let me understand: for each student_id you should have only one uids, browser_id and x_id, is it correct?
if this is your need, you should try to use dc function in stats command, so to have the ex eption you could run something like this:
index="main_idx" app="student_svc" | stats dc(browser_id) AS browser_id_count dc(guid) AS guid_count dc(x_id) AS x_id_count BY student_id | where browser_id_count>1 OR guid_count>1 OR x_id_count>1
See my approach and adapt my sample to your needs.