I have data like:
id,type,id2 1,a,100 2,a,100 3,c, 4,a,101 5,a,101 6,b,102 7,b,102 8,b,102 9,b,103 10,b,103 11,b,103 12,d,104 13,d,104 14,d,104 15,d,104 16,e, 17,c,105 18,c,105
What I am trying to do is to get the output:
So, this is essentially a count of the number of 'type' where id2 are the same. In the above data,
Count of (a=2) is
id 1 and 2 are type 'a' and have id2=100
id 4 and 5 are type 'a' and have id2=101
Count of (b=2) is
id 6,7 and 8 are type 'b' and have id2=102
id 9,10 and 11 are type 'b' and have id2=103
Count of (c=2) is
id 3 is type 'c' and has no id2
id 17 and 18 are type 'c' and have id2=105
Count of (d=1) is
id 12,13,14 and 15 are type 'd' and have id2=104
Count of (e=1) is
id 16 is type 'e' and has no id2
and a final total of 8, which is 2+2+2+1+1
I can't figure out the search required. Note that all types may or may not have id2 and the duplication of id2 can be up to 20 instances of the same id2.
One other query I'm now trying to get from that is to count the frequency of the repeat of a particular id2, i.e.
1=2 - count of id2 being empty (row 3,16)
2=3 - count of id2 having 2 common values (rows 1,2 - 4,5 and 17,18 - values 100,101,105)
3=2 - count of id2 having 3 common values (rows 6,7,8 and 9,10,11- values 102,103)
4=1 - count of id2 having 4 common values (rows 12,13,14,15 - values 104)
I think not. See my revised answer above. Using your nomenclature, it should be this:
... | stats count(id2) AS Legs BY type | stats count BY Legs
In other words, switch out
id2 | for
Sorry, that doesn't work - take this data
Using count(id2) as Legs by id2 gives Legs=4, count=2, whereas BY type gives Legs=8, count=1
Also, for me, type does not have to be the same for each set of id2.
Thanks for your help.
That's not quite right. Using that for types 'a' and 'c' would give
stats dc(id2) as dc by 'a' = 2
stats dc(id2) as dc by 'c' = 1
but these should both be summed to a count of 3 by '2' where 2 is the common number of occurrences of id2. In your example, you would get 1=1 and 2=1 because there is only a single count of dc=2 for 'a' and a single count of dc=1 for 'c'.
I have, for example in my test data, 36 rows, with dc(id2) = 9 and type=20, and there are always 4 rows per id2. What I want is 4=9