Over time I've learned some of the behavior. Here's what I've found so far, with a Do's and Don'ts section after that:
What does prestats=t do?
Prestats formats the returned data in a way that can be parsed and calculated very quickly by another stats/chart/timechart command.
Prestats seems to work the same way summary indices do; as far as I can tell, sistats and prestats produce the same results.
The good news: you can read up on summary indices to get a better idea of what's happening.
The bad news: the docs are very fuzzy on the exact syntax for retrieving data from summary indices.
Summariesonly can be either true or false.
How should I set up the fields so that they flow from tstats prestats=t to stats?
The functions must match exactly. So if you have max(displayTime) in tstats, it has to be that way in the stats statement.
However, you can rename the stats function, so it could say max(displayTime) as maxDisplay.
| tstats prestats=t max(object.field1) from datamodel=foo by object.field2
| stats max(object.field1) as maxField1 by object.field2
The stats By clause must have at least the fields listed in the tstats By clause. I think you can add fields to the stats By clause (see below), but get the basics working first.
What does summariesonly=t do?
It forces Splunk to use only accelerated data in the data model. If you have 30 days of data but only have acceleration for 7 days, using summariesonly=t will return only 7 days of data even if your earliest date is before that. The summariesonly flag is independent of the prestats and can be used with or without prestats.
If the data model is not accelerated and you use summariesonly=t: "No results found"
What does summariesonly=f do?
This is the default behavior, so it's pretty rare to see it spelled out in searches. As far as I can tell, If the datamodel is accelerated, Splunk will use the accelerated data where it can and then search against the underlying data.
If the data model is not accelerated and you use summariesonly=f: Results return normally.
When using tstats, do all of the fields you want to use need to be declared in the data model?
Yes. That all applies to all tstats usage, not just prestats. tstats is reading off of an alternate index that is created when you design the datamodel. It's actually quote powerful because it allows you to create your own objects, which in turn will manage all of the data relationships so that you don't have to declare them in every search. It's a shame the documentation is so sparse.
Are there any special rules for using append=t?
I'm glad you asked, because I lost at least four hours last week figuring this one out.
You must use prestats=t for all tstats commands.
Get the first tstats prestats=t and stats command combo working first before adding additional tstats prestats=t append=t commands. Otherwise debugging them is a nightmare.
It appears that you have to declare all of the functions you are going to use in the first tstats statement, even if they don't exist there. If you don't it, the functions in the second tstats command will be ignored. Think of it like the outputlookup append=t command - if a field wasn't already in the lookup, it won't get added.
In other words, it has to look like this:
| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2
What if the "BY" fields don't match?
This is one major difference between stats and | tstats prestats=t append=t. In stats, NULL fields in the BY statement will exclude that event. But in tstats with appent=t, the missing fields are treated as empty, not null.
How does the count function behave with append=t?
Unpredictably. It gets recalculated in the stats function . Avoid using the function alone; always use it with a field.
Bad
| tstats prestats=t count from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count from datamodel=bar by object.field2 object.field3
| stats count by by object.field1 object.field2 object.field3
In the example above, matches on field2 will count both sets of results. If you don't want that, use this instead:
| tstats prestats=t count(object.uniqueField4) from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count(object.uniqueField5) from datamodel=bar by object.field2 object.field3
| stats count(object.uniqueField4) count(object.uniqueField5) by by object.field1 object.field2 object.field3
Can you use tstats append=t to do conditional counts?
Yes! Here's how I did it without changing the data model.
My goal was to see what percentage of API calls returned in under 1 second. See after the code sample for an explanation of why it works.
| tstats summariesonly=t prestats=t count from datamodel=rest_api
BY object.apiName
| eval perfThreshold="Total"
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=1000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below1sec")
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api
WHERE object.responseTime<=3000
BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below3sec")
| rename object.* as *
| chart count by apiName perfThreshold
| eval pctBelow1Sec=100 * Below1sec / Total
| eval pctBelow3Sec=100 * Below3sec / Total
| table apiName pctBelow1Sec pctBelow3Sec Total
DO's and DON'Ts
What you CAN do between the tstats statement and the stats statement
The bad news: the behavior here can seem pretty wonky, though it does seem to have some internally consistent logic.
The good news: the behavior is the same for summary indices too, which means:
- Once you learn one, the other is much easier to master.
- You can use sistats to write to a summary index the look at the results there, which will give you a much better understanding of how the data flows when using prestats.
Actions you can take:
Rename fields (rename all.* as *)
Add fields. For example, eval webServer=host
Modify group by fields. For example, eval webServer=replace(webServer, ".com", "")
What you CAN'T do between the tstats statement and the stats statement
Change stats calculated values
Set a name for the function in the tstats statement. Not sure why, but it's probably because it returns the values in that special sistats format.
TL;DR: if you just want one search to return quickly, you're probably much better off making a report, accelerating it or scheduling it in the off hours, and calling it from the dashboard. but if you put in the time, you can do some pretty amazing stuff. Patience is key.
Please add any comments or corrections - I'll edit this post as needed.
... View more