Splunk Search

Is datamodel command as fast as tstats command on accelerated data models?

wp-uk-36
Explorer

Hi,

I've gone through Splunk documentation but still struggle to find an official answer to my question.

Given an accelerated data model, is there a difference in terms of performances and results between using the datamodel command and using the tstats command  for querying that data model (assuming the summariesonly option isn't used)

Use case 1:
- The time range targeted has 100% summaries
- My expectation: both commands will leverage summaries, return the same results, perform equally

Uses case 2:
- The time range targeted has say 80% summaries and the last 20% of the time range hasn't
- My expectation: both commands will leverage summaries for the 80% chunk, query raw logs for the last 20%, return the same results, perform equally

Use case 3:
- The time range has no summaries
- My expectation: both commands will query raw logs, return the same results, perform equally

 

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

There is a common confusion between two different things.

One is the data model itself. Another is data model acceleration.

Data model on its own is an abstraction layer. The data model is a definition which your data - if it wants to be compliant with this datamodel - needs to conform to. So - for example - if the Network Traffic datamodel says that the source IP field needs to be named src_ip, you have to make sure that you have such field in your data. If it's named differently in your original data, you have to create an alias or a calculated field so that it's called src_ip.

And that's pretty much it. A "normal" search using data model is "underneath" translated by Splunk to a normal search. You can see the resulting "translated" search by using the "search_string" option of the datamodel command. For example:

| datamodel Network_Traffic search_string

Will show you the "normal" search equivalent of that particular data model search.

So that's one thing.

Another thing (which is often referred to as "datamodel") is data model acceleration. If you enable acceleration, Splunk runs scheduled searches building an additional indexed structure on disk (using the same tsidx data format as the search terms lexicon).

Searching through that accelerated summary is as fast as tstats or searching for indexed fields in your data. So if you're searching the data which has been already summarized, it should be lightning fast.

But there are some caveats.

1. There are several ways of searching data models. While the "datamodel" command supports the "summariesonly" parameter, the "from" command lets you set the datamodel as the data source but here you can't specify summariesonly.

2. Searching is one thing but while tstats operate only on explicitly given fields (regardless of whether you're going only through the summary or if you're also plowing through the unsummarized part) whereas "from" and "datamodel" actually fetch the original events. And these must be read from the "raw index". So you still can use the accelerated summaries for matching but you have read whole raw events.

3. Splunk can optimize some searches. Since you're talking about comparing datamodel command with tstats you have to be talking about some SPL including datamodel search | stats. These will often be optimized to tstats anyway.

For example, if you run simple

| datamodel Network_Traffic search | stats count

and go to job details, you'll see that the optimized search string says

 | tstats prestats=true summariesonly=false allow_old_summaries=true include_reduced_buckets=true use_summary_index_values=true count FROM datamodel=Network_Traffic.All_Traffic | stats count

 

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There is a common confusion between two different things.

One is the data model itself. Another is data model acceleration.

Data model on its own is an abstraction layer. The data model is a definition which your data - if it wants to be compliant with this datamodel - needs to conform to. So - for example - if the Network Traffic datamodel says that the source IP field needs to be named src_ip, you have to make sure that you have such field in your data. If it's named differently in your original data, you have to create an alias or a calculated field so that it's called src_ip.

And that's pretty much it. A "normal" search using data model is "underneath" translated by Splunk to a normal search. You can see the resulting "translated" search by using the "search_string" option of the datamodel command. For example:

| datamodel Network_Traffic search_string

Will show you the "normal" search equivalent of that particular data model search.

So that's one thing.

Another thing (which is often referred to as "datamodel") is data model acceleration. If you enable acceleration, Splunk runs scheduled searches building an additional indexed structure on disk (using the same tsidx data format as the search terms lexicon).

Searching through that accelerated summary is as fast as tstats or searching for indexed fields in your data. So if you're searching the data which has been already summarized, it should be lightning fast.

But there are some caveats.

1. There are several ways of searching data models. While the "datamodel" command supports the "summariesonly" parameter, the "from" command lets you set the datamodel as the data source but here you can't specify summariesonly.

2. Searching is one thing but while tstats operate only on explicitly given fields (regardless of whether you're going only through the summary or if you're also plowing through the unsummarized part) whereas "from" and "datamodel" actually fetch the original events. And these must be read from the "raw index". So you still can use the accelerated summaries for matching but you have read whole raw events.

3. Splunk can optimize some searches. Since you're talking about comparing datamodel command with tstats you have to be talking about some SPL including datamodel search | stats. These will often be optimized to tstats anyway.

For example, if you run simple

| datamodel Network_Traffic search | stats count

and go to job details, you'll see that the optimized search string says

 | tstats prestats=true summariesonly=false allow_old_summaries=true include_reduced_buckets=true use_summary_index_values=true count FROM datamodel=Network_Traffic.All_Traffic | stats count

 

0 Karma

tscroggins
Champion

Hi @wp-uk-36,

The commands are similar when searching accelerated data models. For example, the following seaches have nearly identical execution paths:

| datamodel Authentication Authentication search summariesonly=t
| stats count

| tstats prestats=t summariesonly=t count from datamodel=Authentication.Authentication
| stats count

tstats edges out datamodel when used without the prestats argument:

| tstats summariesonly=t count from datamodel=Authentication.Authentication

Your use cases imply summariesonly=false:

| datamodel Authentication Authentication search summariesonly=f
| stats count

| tstats prestats=t summariesonly=f count from datamodel=Authentication.Authentication
| stats count

Once again, with prestats=true, the execution paths are the same. You can verify this with the job inspector. Find and compare the optimized litsearch and fallback lispy. For example, in my test environment using the Authentication data model and the _audit index:

datamodel

litsearch (index=_audit ((index=_audit sourcetype=audittrailv2* category=authn) OR (index=_audit "action=login attempt" NOT "action=search")) (NOT action=success OR NOT user=*$) (index=* OR index=_*)) DIRECTIVES(REQUIRED_TAGS(intersect="t" tags="cleartext,cloud,default,insecure,multifactor,pci,privileged")) | addinfo  type=count label=prereport_events track_fieldmeta_events=true  | prestats  count

[ AND [ OR [ AND sourcetype::audittrailv2* [ OR authn category::authn ] ] [ AND action attempt login ] ] ]

tstats

litsearch (index=_audit ((index=_audit sourcetype=audittrailv2* category=authn) OR (index=_audit "action=login attempt" NOT "action=search")) (NOT action=success OR NOT user=*$) (index=* OR index=_*)) DIRECTIVES(REQUIRED_TAGS(intersect="t" tags="cleartext,cloud,default,insecure,multifactor,pci,privileged")) | addinfo  type=count label=prereport_events track_fieldmeta_events=true  | prestats  count

[ AND [ OR [ AND sourcetype::audittrailv2* [ OR authn category::authn ] ] [ AND action attempt login ] ] ]

As expected, they are the same. The actual searches will vary by configuration, hot bucket state, etc.

I'm testing with Splunk Enterprise 10.2, and results in earlier versions of Splunk may differ. When in doubt, run semantically identical datamodel and tstats searches and compare the execution plans and search logs in the job inspector.

For a deeper dive, explore archived .conf content. In Google:

site:conf.splunk.com tstats datamodel

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...