I'm using tstats on an accelerated data model which is built off of a summary index. Everything works as expected when querying both the summary index and data model except for an exceptionally large environment that produces 10-100x more results when running dc().
This works fine in said environment and produces 17,000,000~:
| tstats summariesonly=true count(assets.hostname) from datamodel="Summary_Host_Data" where (earliest=-1d latest=now)
This produces 0 results, which should be around 400,000~:
| tstats summariesonly=true dc(assets.hostname) from datamodel="Summary_Host_Data" where (earliest=-1d latest=now)
Even though the summary index works fine and produces 400,000~:
index=summary_host_data earliest=-1d | stats dc(hostname)
Finally, if I search over 6 hours instead of 1d, I do get results from the tstats using dc().
Is there some type of limit I'm running into with dc()? Or is there something else going on?
Hi thisissplunk,
Tstats search syntax seems correct and able to get a valid output for distinct_count on my end.
To my understanding there is no limitations for distinct_count aggregate function.
When you enable acceleration for a data model, Splunk software builds the initial set of .tsidx file summaries for the data model and then runs scheduled searches in the background every 5 minutes to keep those summaries up to date. Each update ensures that the entire configured time range is covered without a significant gap in data. This method of summary building also ensures that late-arriving data is summarized without complication.
Can you please verify DM accelerations searches executions status using below search
index=_internal sourcetype="scheduler" savedsearch_id="<user>;<appname>;_ACCELERATE_DM_<appname>_<DataModelName>_ACCELERATE_"
------
An upvote would be appreciated and Accept Solution if it helps!
Thanks for responding. I've checked for those logs and they all return "success" for the data model acceleration queries.
What I think is going on is that I'm running into some kind of memory error. This is reinforced by:
I just can't figure out where or why.
Have you looked at the job inspector to see if that gives any clues, also have a look at the search log.
You can also get more information in the search log by enabling debug
Have a look at Clara Merriman's great article on the job inspector, which also gives the info on where to add changes to limits.conf to get extra debug to the log.
https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html
Did you end up finding a solution to this? I have a similar care and there is no info anywhere on the matter..
As far as I could tell, it was some type of silent memory or data limit. I got away with using estdc() instead, since 100% accuracy wasn't required for my use case.
You can try limiting the time frame or amount of events and see where it starts breaking with dc().
I'm not sure how to fix the issue. Maybe a config limit or just more memory on the server.