Splunk Search

When is | tstats summariesonly=true 100% finished on an accelerated Data-model?

Motivator

How do I know when | tstats summariesonly=true is 100% finished on an accelerated Data-model?

I have issues where we upload log drops into Splunk from yesterday, so HOST=_NEW_LOG_DROP (So, No new data will go-into this host).

We have noticed that with | tstats summariesonly=true, the performance is a lot better, so we want to keep it on.

However often, users are clicking to see this data and getting a blank screen as the data is not 100% ready.

We can use | tstats summariesonly=false, but we have hundreds of millions of lines, and the performance is better with | tstats summariesonly=true.

So i was thinking, do I run a command like this?

| tstats summariesonly=true count(All_TPS_Logs.duration) AS count  FROM datamodel=TPS_V5 WHERE (nodename=All_TPS_Logs host=LUAS_2019_01_01  (All_TPS_Logs.user=* OR NOT All_TPS_Logs.user=*) All_TPS_Logs.operationIdentity="*") All_TPS_Logs.name =***

vs

| tstats summariesonly=false count(All_TPS_Logs.duration) AS count  FROM datamodel=TPS_V5 WHERE (nodename=All_TPS_Logs host=LUAS_2019_01_01  (All_TPS_Logs.user=* OR NOT All_TPS_Logs.user=*) All_TPS_Logs.operationIdentity="*") All_TPS_Logs.name =***

And when one equals the other the data-model is 100% done?

Thanks in Advance
Rob

1 Solution

SplunkTrust
SplunkTrust

Just a heads up that an accelerated data model runs 3 concurrent searches every 5 minutes by default to rebuild that summary range. So when setting summariesonly=t you will not get back the most recent data because the summary range is not 100% up to date

View solution in original post

0 Karma

Esteemed Legend

Datamodels are typically never finished so long as data is still streaming in. There are searches that run automatically every 5 minutes by default that create the secondary TSIDX files which power you Accelerated Data Models. So anything newer than 5 minutes ago will never be in the ADM and if you have heavy load it may even go farther back than 5 minutes. Which is why you almost always see searches that use earliest=-65m latest=-5m instead of Last hour. That just means that the stuff that is happening in the last 5 minutes will not get examined until an hour later.

0 Karma

Motivator

THanks for your help woodcock, it has helped me to understand them better. 🙂

SplunkTrust
SplunkTrust

Just a heads up that an accelerated data model runs 3 concurrent searches every 5 minutes by default to rebuild that summary range. So when setting summariesonly=t you will not get back the most recent data because the summary range is not 100% up to date

View solution in original post

0 Karma

Motivator

Can i reduce the 5 minutes to 1 minutes, is there a prop for that?

0 Karma

SplunkTrust
SplunkTrust

Ofcourse you can, everything is configurable. But I'm warning you not to do it! Reason being, this will tax the sh** out of your CPU and bring the cluster to a crawl. You're adding 500% load on the CPU. A better approach would be to set summariesonly=f so you search the accelerated data model AND the raw data. You will get the benefit of fast searches over the summary range and the complete data set

The reason you're seeing slow performance when setting the flag to false is because of the added time it takes to search the raw data. Another question for you, how large is your summary range and what is your timerange set to? If your summary range is 1 month and you're searching 6 months then yeah, it's gunna be slow. Depending on your use case, if your timerange is 6 months, and you want to search the last 6 months to NOW. You should setup a 6 month summary range and set the flag to false. If you want 6 months to now-5min then you can set the flag to true and get a lighting fast search result with a complete dataset. Another thought is, since creating a large summary range takes so much disk and CPU, you could create a smaller summary range and combine it with a summary index. This hybrid approach allows you to take advantage of the benefits from each strategy with minimizing disk and CPU

0 Karma

SplunkTrust
SplunkTrust

@robertlynch2020 did this answer your question? If so, can you accept it?

0 Karma

Motivator

HI Skoelpin, sorry for the delay. I was pulled away for a few days there.

I understand the issues now. My main objective is to stop the user getting a blank screen [This happens if summariesonly=true and the data is not summarized].

What i am going to do is the following.
When a new log drop is uploaded i can get the time of upload T I will minus now() when the user click to see there data.
IF greater then 5m i will set the token of summariesonly=false is less then 5 minutes summariesonly=true.

This way the user will always see data and we can use both techniques.

THanks for the help
Rob..

SplunkTrust
SplunkTrust

You could look at the following:

  • use summariesonly=t to get faster response, but this takes into account the data which is summaries by the underlying datamodel [ based on how often it runs and if it gets completed on time, without taking so much run time - you can check performance in the datamodel Audit in Splunk ES dashboard. A smaller run time with 100% completion and is_inprogress not always '1' is a good indicator of DM acceleration performance ]
  • what's your earliest and latest, if you are using tstats, your latest could be 0s or even +1m . So, adjusting your timerange could boost your search time.
  • why do you need All_TPS_Logs.user=* OR NOT All_TPS_Logs.user=* ? [ As part of your datamodel validation, either this field will have 'unknown' or a proper value right? so, if you don't want use, remove that from the where clause]
  • You run the search and look at the 'Job inspector' it will show the 'SID'. you can then take the sid and search in the 'index=_internal sourcetype=splunkd_ui_access < it will show you relevant rest calls with status=200 indicating completion of the job. would this help?
0 Karma

SplunkTrust
SplunkTrust

@robertlynch2020

summariesonly=true Only applies when selecting from an accelerated data model. When false, generates results from both summarized data and data that is not summarized.

Ref: https://docs.splunk.com/Documentation/Splunk/7.2.3/SearchReference/Tstats

Check Review summary creation metrics for data model acceleration status

https://docs.splunk.com/Documentation/Splunk/7.2.3/Knowledge/Acceleratedatamodels

0 Karma

Motivator

Hi

I have an accelerated datamodel, so what is "data that is not summarized". Is this data that will be summarized if i give it more time?

Thanks
Rob

0 Karma

Motivator

@robertlynch2020

yes if the summarisation defined in your search range then it might take a little time to get data summarised. After that you can run search with summariesonly=true

0 Karma