Splunk Search

Tstats - What factors come into play when deciding to put a field in the beginning or after the GROUPBY section?

Explorer

When analyzing different tstats commands in some apps we've installed, sometimes I see fields at the beginning along with count, and sometimes they are in the groupby. Playing around with them doesn't seem to produce different results. For example:

| tstats values(x), values(y), count FROM datamodel GROUPBY z

seems to give me the same thing as

| tstats values(x), count FROM datamodel GROUPBY y z

I'm having a hard time figuring out the different and the benefit of one over the other. I've read the documentation and the wonderful writeup by @skawasaki_splunk in the question here entitled: "What is tstats and why is so much faster than stats?" But I'm not really finding my particular question addressed.

Tags (1)

Splunk Employee
Splunk Employee

This won't give exactly the same thing since the first one will give the field values(y) while the latter gives just y. Aside from that, I think those two can give the same results if for every unique z, there is one unique y. Just to make sure you understand that values() gives all distinct values, right? So it will remove duplicate values. It also sorts the distinct values alphabetically.

0 Karma

Explorer

So if I want a count of every time "y" shows up with "z", put it in groupby; if it is possible for "z" to have multiple values of "y" in the same or multiple events AND I'm okay with showing those multiple values of "y" in the same row for each unique "z," then use values(y) at the beginning. Is that right?

Let's say I change the values(y) to sum(y) (think bytes in a traffic log). Then instead of showing all the different numbers in the "values(bytes)" cell, I would have a cell with the sum of all the numbers. That cell would be for every instance of "z." If I change groupby to "a, b, z" then every possible combination of those 4 values would show up with a sum. Right?

let's say each of those could be 1 or 2, this would be the possibilites:

a b z
1 1 1
1 1 2
1 2 1
1 2 2
2 1 1
2 1 2
2 2 1
2 2 2
(I may have missed one or two possibilities)

So for every time there is an event where a, b & z are 1, it would add up the total of the field "bytes" and we would have one line telling us 1,1,1,sum(bytes). And we keep going.

Do I have it?

And there would not be much reason to put "bytes" in the groupby field from what I can tell.

0 Karma

Splunk Employee
Splunk Employee

I'm pretty confused on what you're trying to do. Is it | tstats sum(bytes) from datamodel=foo by a b z? This will give you the sum of bytes for every unique combination of a, b, and z.

0 Karma

Explorer

Yes, that is exactly right but it took your first answer to help me even formulate the question. I've played a lot with it since you answered this morning and it was exactly what I needed in order to understand it better.

0 Karma

Splunk Employee
Splunk Employee

Just like stats, be careful for when either a, b, or z is null() since then it won't show up at all. If you need to account for null values, then use either fillnull or coalesce() before stats. Since you can't do that in tstats (tstats has to be the first command), then you'll have to modify your data model and do something like a=coalesce(a, "null") as an evaluated field.

0 Karma

Explorer

Thank you. I've seen this comment (probably by you) somewhere else. I knew that was the case because I've had to troubleshoot an app that doesn't work in many places out of the box. But now I understand why.

Thanks again for your help.

0 Karma

SplunkTrust
SplunkTrust

I think you got it. The aggregation (by tstats functions) is done for unique combination of fields specified in GROUPBY clause. The aggregate functions (values,sum,avg etc) is applied on the specified field (except for function count) for that unique combination. You can find list of function and what it does (after clicking hyperlink) in this link:
https://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Tstats#Required_arguments

0 Karma

Explorer

Thanks very much!

(It's not letting me marked it as answered. I would mark @skawasaki_splunk comment as the answer)

0 Karma