Splunk Search

What *exactly* are the rules/requirements for using "|tstats append=t"?

woodcock
Esteemed Legend

I must join some exceedingly large DM datasets but I cannot get |tstats prestats=t append=t to work consistently in any way that I can understand. I have 3 basic problems:

1: My DMs are not accelerated because I must work on a Dev Search Head. So I am peered to the production Indexers but our DMs are not accelerated (so I may have to build searches that I cannot see run).
2: I have no idea how to get tstats append=t to work or what the rules for it's usage are.
3: I am assuming that append=t means unlimited (not a subsearch) but the documentation is silent. If it is not unlimited then I am toast anyway and the answers don't really matter. So is it unlimited or not?

Here is what I do know:
To use append=t, I must also use prestats=t, fair enough.

My experimentation leads me to believe that in order to use prestats=t, I must also use summariesonly=f; I am not sure about this point, though, because this is not documented, so I would like some confirmation here. If true, then this would appear to bode well for me in my situation BUT...

My experimentation leads me to believe that in order to use prestats=t, I must use tstats against against data that actually does have summaries (e.g. tsidx), because when I run |tstats prestats=t append=t summariesonly=f against my unaccelerated DMs, it returns immediately with no results.

SO, in conclusion it appears that |tstats append=t only works against the main tsidx (not a DM) or against accelerated DMs, which means that I am flying blind. Right?

P.S. Answers is chock full of unanswered, useless, contradictory, and just plain wrong answers regarding append=t. It is quite shocking and leads me to suspect that perhaps the rules for this setting have changed over time.

1 Solution

Kenshiro70
Path Finder

Over time I've learned some of the behavior. Here's what I've found so far, with a Do's and Don'ts section after that:

What does prestats=t do?
Prestats formats the returned data in a way that can be parsed and calculated very quickly by another stats/chart/timechart command.

  • Prestats seems to work the same way summary indices do; as far as I can tell, sistats and prestats produce the same results.
  • The good news: you can read up on summary indices to get a better idea of what's happening.
  • The bad news: the docs are very fuzzy on the exact syntax for retrieving data from summary indices.
  • Summariesonly can be either true or false.

How should I set up the fields so that they flow from tstats prestats=t to stats?

  • The functions must match exactly. So if you have max(displayTime) in tstats, it has to be that way in the stats statement.

  • However, you can rename the stats function, so it could say max(displayTime) as maxDisplay.

    | tstats prestats=t max(object.field1) from datamodel=foo by object.field2
    | stats max(object.field1) as maxField1 by object.field2
    The stats By clause must have at least the fields listed in the tstats By clause. I think you can add fields to the stats By clause (see below), but get the basics working first.

What does summariesonly=t do?
It forces Splunk to use only accelerated data in the data model. If you have 30 days of data but only have acceleration for 7 days, using summariesonly=t will return only 7 days of data even if your earliest date is before that. The summariesonly flag is independent of the prestats and can be used with or without prestats.

If the data model is not accelerated and you use summariesonly=t: "No results found"

What does summariesonly=f do?
This is the default behavior, so it's pretty rare to see it spelled out in searches. As far as I can tell, If the datamodel is accelerated, Splunk will use the accelerated data where it can and then search against the underlying data.

If the data model is not accelerated and you use summariesonly=f: Results return normally.

When using tstats, do all of the fields you want to use need to be declared in the data model?
Yes. That all applies to all tstats usage, not just prestats. tstats is reading off of an alternate index that is created when you design the datamodel. It's actually quote powerful because it allows you to create your own objects, which in turn will manage all of the data relationships so that you don't have to declare them in every search. It's a shame the documentation is so sparse.

Are there any special rules for using append=t?
I'm glad you asked, because I lost at least four hours last week figuring this one out.

  • You must use prestats=t for all tstats commands.
  • Get the first tstats prestats=t and stats command combo working first before adding additional tstats prestats=t append=t commands. Otherwise debugging them is a nightmare.
  • It appears that you have to declare all of the functions you are going to use in the first tstats statement, even if they don't exist there. If you don't it, the functions in the second tstats command will be ignored. Think of it like the outputlookup append=t command - if a field wasn't already in the lookup, it won't get added.

In other words, it has to look like this:

| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo 
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2

What if the "BY" fields don't match?
This is one major difference between stats and | tstats prestats=t append=t. In stats, NULL fields in the BY statement will exclude that event. But in tstats with appent=t, the missing fields are treated as empty, not null.

How does the count function behave with append=t?
Unpredictably. It gets recalculated in the stats function . Avoid using the function alone; always use it with a field.

Bad

| tstats prestats=t count from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count from datamodel=bar by object.field2 object.field3
| stats count by  by object.field1 object.field2 object.field3

In the example above, matches on field2 will count both sets of results. If you don't want that, use this instead:

| tstats prestats=t count(object.uniqueField4) from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count(object.uniqueField5) from datamodel=bar by object.field2 object.field3
| stats count(object.uniqueField4) count(object.uniqueField5) by  by object.field1 object.field2 object.field3

Can you use tstats append=t to do conditional counts?
Yes! Here's how I did it without changing the data model.

My goal was to see what percentage of API calls returned in under 1 second. See after the code sample for an explanation of why it works.

| tstats summariesonly=t prestats=t count from datamodel=rest_api 
       BY object.apiName
| eval perfThreshold="Total"
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api 
       WHERE object.responseTime<=1000
       BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below1sec")
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api 
       WHERE object.responseTime<=3000
       BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below3sec")
| rename object.* as *
| chart count by apiName perfThreshold
| eval pctBelow1Sec=100 * Below1sec / Total
| eval pctBelow3Sec=100 * Below3sec / Total
| table apiName pctBelow1Sec pctBelow3Sec Total

DO's and DON'Ts


What you CAN do between the tstats statement and the stats statement
The bad news: the behavior here can seem pretty wonky, though it does seem to have some internally consistent logic.
The good news: the behavior is the same for summary indices too, which means:
- Once you learn one, the other is much easier to master.
- You can use sistats to write to a summary index the look at the results there, which will give you a much better understanding of how the data flows when using prestats.

Actions you can take:

  • Rename fields (rename all.* as *)

  • Add fields. For example, eval webServer=host

  • Modify group by fields. For example, eval webServer=replace(webServer, ".com", "")

What you CAN'T do between the tstats statement and the stats statement

  • Change stats calculated values

  • Set a name for the function in the tstats statement. Not sure why, but it's probably because it returns the values in that special sistats format.


TL;DR: if you just want one search to return quickly, you're probably much better off making a report, accelerating it or scheduling it in the off hours, and calling it from the dashboard. but if you put in the time, you can do some pretty amazing stuff. Patience is key.

Please add any comments or corrections - I'll edit this post as needed.

View solution in original post

Kenshiro70
Path Finder

Over time I've learned some of the behavior. Here's what I've found so far, with a Do's and Don'ts section after that:

What does prestats=t do?
Prestats formats the returned data in a way that can be parsed and calculated very quickly by another stats/chart/timechart command.

  • Prestats seems to work the same way summary indices do; as far as I can tell, sistats and prestats produce the same results.
  • The good news: you can read up on summary indices to get a better idea of what's happening.
  • The bad news: the docs are very fuzzy on the exact syntax for retrieving data from summary indices.
  • Summariesonly can be either true or false.

How should I set up the fields so that they flow from tstats prestats=t to stats?

  • The functions must match exactly. So if you have max(displayTime) in tstats, it has to be that way in the stats statement.

  • However, you can rename the stats function, so it could say max(displayTime) as maxDisplay.

    | tstats prestats=t max(object.field1) from datamodel=foo by object.field2
    | stats max(object.field1) as maxField1 by object.field2
    The stats By clause must have at least the fields listed in the tstats By clause. I think you can add fields to the stats By clause (see below), but get the basics working first.

What does summariesonly=t do?
It forces Splunk to use only accelerated data in the data model. If you have 30 days of data but only have acceleration for 7 days, using summariesonly=t will return only 7 days of data even if your earliest date is before that. The summariesonly flag is independent of the prestats and can be used with or without prestats.

If the data model is not accelerated and you use summariesonly=t: "No results found"

What does summariesonly=f do?
This is the default behavior, so it's pretty rare to see it spelled out in searches. As far as I can tell, If the datamodel is accelerated, Splunk will use the accelerated data where it can and then search against the underlying data.

If the data model is not accelerated and you use summariesonly=f: Results return normally.

When using tstats, do all of the fields you want to use need to be declared in the data model?
Yes. That all applies to all tstats usage, not just prestats. tstats is reading off of an alternate index that is created when you design the datamodel. It's actually quote powerful because it allows you to create your own objects, which in turn will manage all of the data relationships so that you don't have to declare them in every search. It's a shame the documentation is so sparse.

Are there any special rules for using append=t?
I'm glad you asked, because I lost at least four hours last week figuring this one out.

  • You must use prestats=t for all tstats commands.
  • Get the first tstats prestats=t and stats command combo working first before adding additional tstats prestats=t append=t commands. Otherwise debugging them is a nightmare.
  • It appears that you have to declare all of the functions you are going to use in the first tstats statement, even if they don't exist there. If you don't it, the functions in the second tstats command will be ignored. Think of it like the outputlookup append=t command - if a field wasn't already in the lookup, it won't get added.

In other words, it has to look like this:

| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo 
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2

What if the "BY" fields don't match?
This is one major difference between stats and | tstats prestats=t append=t. In stats, NULL fields in the BY statement will exclude that event. But in tstats with appent=t, the missing fields are treated as empty, not null.

How does the count function behave with append=t?
Unpredictably. It gets recalculated in the stats function . Avoid using the function alone; always use it with a field.

Bad

| tstats prestats=t count from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count from datamodel=bar by object.field2 object.field3
| stats count by  by object.field1 object.field2 object.field3

In the example above, matches on field2 will count both sets of results. If you don't want that, use this instead:

| tstats prestats=t count(object.uniqueField4) from datamodel=foo where by object.field1 object.field2
| tstats prestats=t append=t count(object.uniqueField5) from datamodel=bar by object.field2 object.field3
| stats count(object.uniqueField4) count(object.uniqueField5) by  by object.field1 object.field2 object.field3

Can you use tstats append=t to do conditional counts?
Yes! Here's how I did it without changing the data model.

My goal was to see what percentage of API calls returned in under 1 second. See after the code sample for an explanation of why it works.

| tstats summariesonly=t prestats=t count from datamodel=rest_api 
       BY object.apiName
| eval perfThreshold="Total"
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api 
       WHERE object.responseTime<=1000
       BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below1sec")
| tstats summariesonly=t prestats=t append=t count from datamodel=rest_api 
       WHERE object.responseTime<=3000
       BY object.apiName
| eval perfThreshold=case(isnotnull(perfThreshold), perfThreshold, true(), "Below3sec")
| rename object.* as *
| chart count by apiName perfThreshold
| eval pctBelow1Sec=100 * Below1sec / Total
| eval pctBelow3Sec=100 * Below3sec / Total
| table apiName pctBelow1Sec pctBelow3Sec Total

DO's and DON'Ts


What you CAN do between the tstats statement and the stats statement
The bad news: the behavior here can seem pretty wonky, though it does seem to have some internally consistent logic.
The good news: the behavior is the same for summary indices too, which means:
- Once you learn one, the other is much easier to master.
- You can use sistats to write to a summary index the look at the results there, which will give you a much better understanding of how the data flows when using prestats.

Actions you can take:

  • Rename fields (rename all.* as *)

  • Add fields. For example, eval webServer=host

  • Modify group by fields. For example, eval webServer=replace(webServer, ".com", "")

What you CAN'T do between the tstats statement and the stats statement

  • Change stats calculated values

  • Set a name for the function in the tstats statement. Not sure why, but it's probably because it returns the values in that special sistats format.


TL;DR: if you just want one search to return quickly, you're probably much better off making a report, accelerating it or scheduling it in the off hours, and calling it from the dashboard. but if you put in the time, you can do some pretty amazing stuff. Patience is key.

Please add any comments or corrections - I'll edit this post as needed.

Keysofsandiego
Path Finder

HI,
Great write up... 

One suggestion,
one of your stats commands references the fields that are not available after your prestats command. 

 

| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo 
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(field1) as maxField1 max(field2) as maxField2

Can you replace with this 

| tstats prestats=t max(object.field1) max(object.field2) from datamodel=foo 
| tstats prestats=t append=t max(object.field2) from datamodel=foo
| stats max(object.field1) as maxField1 max(object.field2) as maxField2

 

 

Thanks, 

 

0 Karma

cgodon
Engager

Thanks a lots 🙂

0 Karma

jvardev
Path Finder

Very useful facts about tstats. Thank you.

0 Karma

Ayn
Legend

This is an awesome writeup, kudos.

0 Karma

Kenshiro70
Path Finder

Thanks! If I've saved you some of the pain I experienced in learning the stuff in here, then it's worth it.

0 Karma

rjthibod
Champion

I would agree with your findings. There is nothing in documentation that explicitly says it, but I deduce that prestatss=t only works when the data you are after is already in a TSIDX file, i.e., raw index or accelerated DM. If confirmed. That note should be made explicit in the docs.

0 Karma

Kenshiro70
Path Finder

On your inability to accelerate and test - see my answer below. Prestats and the sistats command seem to produce very similar results, so a back door way to test might be to use sistats to write to a summary index, then search the summary index and summarize with a stats command. Prestats and stats should be very close in their behavior (as far as I've seen at least).

0 Karma

woodcock
Esteemed Legend

I will make a note on the docs page.

0 Karma

woodcock
Esteemed Legend

Docs commented.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...