When analyzing different tstats commands in some apps we've installed, sometimes I see fields at the beginning along with count, and sometimes they are in the groupby. Playing around with them doesn't seem to produce different results. For example:
| tstats values(x), values(y), count FROM datamodel GROUPBY z
seems to give me the same thing as
| tstats values(x), count FROM datamodel GROUPBY y z
I'm having a hard time figuring out the different and the benefit of one over the other. I've read the documentation and the wonderful writeup by @skawasaki_splunk in the question here entitled: "What is tstats and why is so much faster than stats?" But I'm not really finding my particular question addressed.
This won't give exactly the same thing since the first one will give the field
values(y) while the latter gives just
y. Aside from that, I think those two can give the same results if for every unique
z, there is one unique
y. Just to make sure you understand that
values() gives all distinct values, right? So it will remove duplicate values. It also sorts the distinct values alphabetically.
So if I want a count of every time "y" shows up with "z", put it in groupby; if it is possible for "z" to have multiple values of "y" in the same or multiple events AND I'm okay with showing those multiple values of "y" in the same row for each unique "z," then use values(y) at the beginning. Is that right?
Let's say I change the values(y) to sum(y) (think bytes in a traffic log). Then instead of showing all the different numbers in the "values(bytes)" cell, I would have a cell with the sum of all the numbers. That cell would be for every instance of "z." If I change groupby to "a, b, z" then every possible combination of those 4 values would show up with a sum. Right?
let's say each of those could be 1 or 2, this would be the possibilites:
a b z
1 1 1
1 1 2
1 2 1
1 2 2
2 1 1
2 1 2
2 2 1
2 2 2
(I may have missed one or two possibilities)
So for every time there is an event where a, b & z are 1, it would add up the total of the field "bytes" and we would have one line telling us 1,1,1,sum(bytes). And we keep going.
Do I have it?
And there would not be much reason to put "bytes" in the groupby field from what I can tell.
I'm pretty confused on what you're trying to do. Is it
| tstats sum(bytes) from datamodel=foo by a b z? This will give you the sum of bytes for every unique combination of
Yes, that is exactly right but it took your first answer to help me even formulate the question. I've played a lot with it since you answered this morning and it was exactly what I needed in order to understand it better.
stats, be careful for when either
null() since then it won't show up at all. If you need to account for null values, then use either
stats. Since you can't do that in
tstats has to be the first command), then you'll have to modify your data model and do something like
a=coalesce(a, "null") as an evaluated field.
Thank you. I've seen this comment (probably by you) somewhere else. I knew that was the case because I've had to troubleshoot an app that doesn't work in many places out of the box. But now I understand why.
Thanks again for your help.
I think you got it. The aggregation (by tstats functions) is done for unique combination of fields specified in GROUPBY clause. The aggregate functions (values,sum,avg etc) is applied on the specified field (except for function
count) for that unique combination. You can find list of function and what it does (after clicking hyperlink) in this link: