I must join some exceedingly large DM datasets but I cannot get |tstats prestats=t append=t
to work consistently in any way that I can understand. I have 3 basic problems:
1: My DMs are not accelerated because I must work on a Dev Search Head. So I am peered to the production Indexers but our DMs are not accelerated (so I may have to build searches that I cannot see run).
2: I have no idea how to get tstats append=t
to work or what the rules for it's usage are.
3: I am assuming that append=t
means unlimited (not a subsearch) but the documentation is silent. If it is not unlimited then I am toast anyway and the answers don't really matter. So is it unlimited or not?
Here is what I do know:
To use append=t
, I must also use prestats=t
, fair enough.
My experimentation leads me to believe that in order to use prestats=t
, I must also use summariesonly=f
; I am not sure about this point, though, because this is not documented, so I would like some confirmation here. If true, then this would appear to bode well for me in my situation BUT...
My experimentation leads me to believe that in order to use prestats=t
, I must use tstats
against against data that actually does have summaries (e.g. tsidx), because when I run |tstats prestats=t append=t summariesonly=f
against my unaccelerated DMs, it returns immediately with no results.
SO, in conclusion it appears that |tstats append=t
only works against the main tsidx (not a DM) or against accelerated DMs, which means that I am flying blind. Right?
P.S. Answers is chock full of unanswered, useless, contradictory, and just plain wrong answers regarding append=t
. It is quite shocking and leads me to suspect that perhaps the rules for this setting have changed over time.
Over time I've learned some of the behavior. Here's what I've found so far, with a Do's and Don'ts section after that:
What does prestats=t do?
Prestats formats the returned data in a way that can be parsed and calculated very quickly by another stats/chart/timechart command.
How should I set up the fields so that they flow from tstats prestats=t to stats?
The functions must match exactly. So if you have max(displayTime) in tstats, it has to be that way in the stats statement.
However, you can rename the stats function, so it could say max(displayTime) as maxDisplay.
| tstats prestats=t max(object.field1) from datamodel=foo by object.field2
| stats max(object.field1) as maxField1 by object.field2
The stats By clause must have at least the fields listed in the tstats By clause. I think you can add fields to the stats By clause (see below), but get the basics working first.
What does summariesonly=t do?
It forces Splunk to use only accelerated data in the data model. If you have 30 days of data but only have acceleration for 7 days, using summariesonly=t will return only 7 days of data even if your earliest date is before that. The summariesonly flag is independent of the prestats and can be used with or without prestats.
If the data model is not accelerated and you use summariesonly=t: "No results found"
What does summariesonly=f do?
This is the default behavior, so it's pretty rare to see it spelled out in searches. As far as I can tell, If the datamodel is accelerated, Splunk will use the accelerated data where it can and then search against the underlying data.
If the data model is not accelerated and you use summariesonly=f: Results return normally.
When using tstats, do all of the fields you want to use need to be declared in the data model?
Yes. That all applies to all tstats usage, not just prestats. tstats is reading off of an alternate index that is created when you design the datamodel. It's actually quote powerful because it allows you to create your own objects, which in turn will manage all of the data relationships so that you don't have to declare them in every search. It's a shame the documentation is so sparse.
Are there any special rules for using append=t?
I'm glad you asked, because I lost at least four hours last week figuring this one out.
In other words, it has to look like this:
| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2
What if the "BY" fields don't match?
This is one major difference between stats and | tstats prestats=t append=t. In stats, NULL fields in the BY statement will exclude that event. But in tstats with appent=t, the missing fields are treated as empty, not null.
How does the count function behave with append=t?
Unpredictably. It gets recalculated in the stats function . Avoid using the function alone; always use it with a field.
Bad
| tstats prestats=t count from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count from datamodel=bar by object.field2 object.field3
| stats count by by object.field1 object.field2 object.field3
In the example above, matches on field2 will count both sets of results. If you don't want that, use this instead:
| tstats prestats=t count(object.uniqueField4) from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count(object.uniqueField5) from datamodel=bar by object.field2 object.field3
| stats count(object.uniqueField4) count(object.uniqueField5) by by object.field1 object.field2 object.field3
Can you use tstats append=t to do conditional counts?
Yes! Here's how I did it without changing the data model.
My goal was to see what percentage of API calls returned in under 1 second. See after the code sample for an explanation of why it works.
| tstats summariesonly=t prestats=t count from datamodel=rest_api
BY object.apiName
| eval perfThreshold="Total"
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=1000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below1sec")
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=3000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below3sec")
| rename object.* as *
| chart count by apiName perfThreshold
| eval pctBelow1Sec=100 * Below1sec / Total
| eval pctBelow3Sec=100 * Below3sec / Total
| table apiName pctBelow1Sec pctBelow3Sec Total
DO's and DON'Ts
What you CAN do between the tstats statement and the stats statement
The bad news: the behavior here can seem pretty wonky, though it does seem to have some internally consistent logic.
The good news: the behavior is the same for summary indices too, which means:
- Once you learn one, the other is much easier to master.
- You can use sistats to write to a summary index the look at the results there, which will give you a much better understanding of how the data flows when using prestats.
Actions you can take:
Rename fields (rename all.* as *)
Add fields. For example, eval webServer=host
Modify group by fields. For example, eval webServer=replace(webServer, ".com", "")
What you CAN'T do between the tstats statement and the stats statement
Change stats calculated values
Set a name for the function in the tstats statement. Not sure why, but it's probably because it returns the values in that special sistats format.
TL;DR: if you just want one search to return quickly, you're probably much better off making a report, accelerating it or scheduling it in the off hours, and calling it from the dashboard. but if you put in the time, you can do some pretty amazing stuff. Patience is key.
Please add any comments or corrections - I'll edit this post as needed.
Over time I've learned some of the behavior. Here's what I've found so far, with a Do's and Don'ts section after that:
What does prestats=t do?
Prestats formats the returned data in a way that can be parsed and calculated very quickly by another stats/chart/timechart command.
How should I set up the fields so that they flow from tstats prestats=t to stats?
The functions must match exactly. So if you have max(displayTime) in tstats, it has to be that way in the stats statement.
However, you can rename the stats function, so it could say max(displayTime) as maxDisplay.
| tstats prestats=t max(object.field1) from datamodel=foo by object.field2
| stats max(object.field1) as maxField1 by object.field2
The stats By clause must have at least the fields listed in the tstats By clause. I think you can add fields to the stats By clause (see below), but get the basics working first.
What does summariesonly=t do?
It forces Splunk to use only accelerated data in the data model. If you have 30 days of data but only have acceleration for 7 days, using summariesonly=t will return only 7 days of data even if your earliest date is before that. The summariesonly flag is independent of the prestats and can be used with or without prestats.
If the data model is not accelerated and you use summariesonly=t: "No results found"
What does summariesonly=f do?
This is the default behavior, so it's pretty rare to see it spelled out in searches. As far as I can tell, If the datamodel is accelerated, Splunk will use the accelerated data where it can and then search against the underlying data.
If the data model is not accelerated and you use summariesonly=f: Results return normally.
When using tstats, do all of the fields you want to use need to be declared in the data model?
Yes. That all applies to all tstats usage, not just prestats. tstats is reading off of an alternate index that is created when you design the datamodel. It's actually quote powerful because it allows you to create your own objects, which in turn will manage all of the data relationships so that you don't have to declare them in every search. It's a shame the documentation is so sparse.
Are there any special rules for using append=t?
I'm glad you asked, because I lost at least four hours last week figuring this one out.
In other words, it has to look like this:
| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2
What if the "BY" fields don't match?
This is one major difference between stats and | tstats prestats=t append=t. In stats, NULL fields in the BY statement will exclude that event. But in tstats with appent=t, the missing fields are treated as empty, not null.
How does the count function behave with append=t?
Unpredictably. It gets recalculated in the stats function . Avoid using the function alone; always use it with a field.
Bad
| tstats prestats=t count from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count from datamodel=bar by object.field2 object.field3
| stats count by by object.field1 object.field2 object.field3
In the example above, matches on field2 will count both sets of results. If you don't want that, use this instead:
| tstats prestats=t count(object.uniqueField4) from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count(object.uniqueField5) from datamodel=bar by object.field2 object.field3
| stats count(object.uniqueField4) count(object.uniqueField5) by by object.field1 object.field2 object.field3
Can you use tstats append=t to do conditional counts?
Yes! Here's how I did it without changing the data model.
My goal was to see what percentage of API calls returned in under 1 second. See after the code sample for an explanation of why it works.
| tstats summariesonly=t prestats=t count from datamodel=rest_api
BY object.apiName
| eval perfThreshold="Total"
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=1000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below1sec")
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=3000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below3sec")
| rename object.* as *
| chart count by apiName perfThreshold
| eval pctBelow1Sec=100 * Below1sec / Total
| eval pctBelow3Sec=100 * Below3sec / Total
| table apiName pctBelow1Sec pctBelow3Sec Total
DO's and DON'Ts
What you CAN do between the tstats statement and the stats statement
The bad news: the behavior here can seem pretty wonky, though it does seem to have some internally consistent logic.
The good news: the behavior is the same for summary indices too, which means:
- Once you learn one, the other is much easier to master.
- You can use sistats to write to a summary index the look at the results there, which will give you a much better understanding of how the data flows when using prestats.
Actions you can take:
Rename fields (rename all.* as *)
Add fields. For example, eval webServer=host
Modify group by fields. For example, eval webServer=replace(webServer, ".com", "")
What you CAN'T do between the tstats statement and the stats statement
Change stats calculated values
Set a name for the function in the tstats statement. Not sure why, but it's probably because it returns the values in that special sistats format.
TL;DR: if you just want one search to return quickly, you're probably much better off making a report, accelerating it or scheduling it in the off hours, and calling it from the dashboard. but if you put in the time, you can do some pretty amazing stuff. Patience is key.
Please add any comments or corrections - I'll edit this post as needed.
Thanks a lots 🙂
Very useful facts about tstats. Thank you.
This is an awesome writeup, kudos.
Thanks! If I've saved you some of the pain I experienced in learning the stuff in here, then it's worth it.
I would agree with your findings. There is nothing in documentation that explicitly says it, but I deduce that prestatss=t
only works when the data you are after is already in a TSIDX file, i.e., raw index or accelerated DM. If confirmed. That note should be made explicit in the docs.
On your inability to accelerate and test - see my answer below. Prestats and the sistats command seem to produce very similar results, so a back door way to test might be to use sistats to write to a summary index, then search the summary index and summarize with a stats command. Prestats and stats should be very close in their behavior (as far as I've seen at least).
I will make a note on the docs page.
Docs commented.