Hi,
I've gone through Splunk documentation but still struggle to find an official answer to my question.
Given an accelerated data model, is there a difference in terms of performances and results between using the datamodel command and using the tstats command for querying that data model (assuming the summariesonly option isn't used)
Use case 1:
- The time range targeted has 100% summaries
- My expectation: both commands will leverage summaries, return the same results, perform equally
Uses case 2:
- The time range targeted has say 80% summaries and the last 20% of the time range hasn't
- My expectation: both commands will leverage summaries for the 80% chunk, query raw logs for the last 20%, return the same results, perform equally
Use case 3:
- The time range has no summaries
- My expectation: both commands will query raw logs, return the same results, perform equally
There is a common confusion between two different things.
One is the data model itself. Another is data model acceleration.
Data model on its own is an abstraction layer. The data model is a definition which your data - if it wants to be compliant with this datamodel - needs to conform to. So - for example - if the Network Traffic datamodel says that the source IP field needs to be named src_ip, you have to make sure that you have such field in your data. If it's named differently in your original data, you have to create an alias or a calculated field so that it's called src_ip.
And that's pretty much it. A "normal" search using data model is "underneath" translated by Splunk to a normal search. You can see the resulting "translated" search by using the "search_string" option of the datamodel command. For example:
| datamodel Network_Traffic search_string
Will show you the "normal" search equivalent of that particular data model search.
So that's one thing.
Another thing (which is often referred to as "datamodel") is data model acceleration. If you enable acceleration, Splunk runs scheduled searches building an additional indexed structure on disk (using the same tsidx data format as the search terms lexicon).
Searching through that accelerated summary is as fast as tstats or searching for indexed fields in your data. So if you're searching the data which has been already summarized, it should be lightning fast.
But there are some caveats.
1. There are several ways of searching data models. While the "datamodel" command supports the "summariesonly" parameter, the "from" command lets you set the datamodel as the data source but here you can't specify summariesonly.
2. Searching is one thing but while tstats operate only on explicitly given fields (regardless of whether you're going only through the summary or if you're also plowing through the unsummarized part) whereas "from" and "datamodel" actually fetch the original events. And these must be read from the "raw index". So you still can use the accelerated summaries for matching but you have read whole raw events.
3. Splunk can optimize some searches. Since you're talking about comparing datamodel command with tstats you have to be talking about some SPL including datamodel search | stats. These will often be optimized to tstats anyway.
For example, if you run simple
| datamodel Network_Traffic search | stats count
and go to job details, you'll see that the optimized search string says
| tstats prestats=true summariesonly=false allow_old_summaries=true include_reduced_buckets=true use_summary_index_values=true count FROM datamodel=Network_Traffic.All_Traffic | stats count
There is a common confusion between two different things.
One is the data model itself. Another is data model acceleration.
Data model on its own is an abstraction layer. The data model is a definition which your data - if it wants to be compliant with this datamodel - needs to conform to. So - for example - if the Network Traffic datamodel says that the source IP field needs to be named src_ip, you have to make sure that you have such field in your data. If it's named differently in your original data, you have to create an alias or a calculated field so that it's called src_ip.
And that's pretty much it. A "normal" search using data model is "underneath" translated by Splunk to a normal search. You can see the resulting "translated" search by using the "search_string" option of the datamodel command. For example:
| datamodel Network_Traffic search_string
Will show you the "normal" search equivalent of that particular data model search.
So that's one thing.
Another thing (which is often referred to as "datamodel") is data model acceleration. If you enable acceleration, Splunk runs scheduled searches building an additional indexed structure on disk (using the same tsidx data format as the search terms lexicon).
Searching through that accelerated summary is as fast as tstats or searching for indexed fields in your data. So if you're searching the data which has been already summarized, it should be lightning fast.
But there are some caveats.
1. There are several ways of searching data models. While the "datamodel" command supports the "summariesonly" parameter, the "from" command lets you set the datamodel as the data source but here you can't specify summariesonly.
2. Searching is one thing but while tstats operate only on explicitly given fields (regardless of whether you're going only through the summary or if you're also plowing through the unsummarized part) whereas "from" and "datamodel" actually fetch the original events. And these must be read from the "raw index". So you still can use the accelerated summaries for matching but you have read whole raw events.
3. Splunk can optimize some searches. Since you're talking about comparing datamodel command with tstats you have to be talking about some SPL including datamodel search | stats. These will often be optimized to tstats anyway.
For example, if you run simple
| datamodel Network_Traffic search | stats count
and go to job details, you'll see that the optimized search string says
| tstats prestats=true summariesonly=false allow_old_summaries=true include_reduced_buckets=true use_summary_index_values=true count FROM datamodel=Network_Traffic.All_Traffic | stats count
Hi @wp-uk-36,
The commands are similar when searching accelerated data models. For example, the following seaches have nearly identical execution paths:
| datamodel Authentication Authentication search summariesonly=t
| stats count
| tstats prestats=t summariesonly=t count from datamodel=Authentication.Authentication
| stats counttstats edges out datamodel when used without the prestats argument:
| tstats summariesonly=t count from datamodel=Authentication.AuthenticationYour use cases imply summariesonly=false:
| datamodel Authentication Authentication search summariesonly=f
| stats count
| tstats prestats=t summariesonly=f count from datamodel=Authentication.Authentication
| stats countOnce again, with prestats=true, the execution paths are the same. You can verify this with the job inspector. Find and compare the optimized litsearch and fallback lispy. For example, in my test environment using the Authentication data model and the _audit index:
datamodel
litsearch (index=_audit ((index=_audit sourcetype=audittrailv2* category=authn) OR (index=_audit "action=login attempt" NOT "action=search")) (NOT action=success OR NOT user=*$) (index=* OR index=_*)) DIRECTIVES(REQUIRED_TAGS(intersect="t" tags="cleartext,cloud,default,insecure,multifactor,pci,privileged")) | addinfo type=count label=prereport_events track_fieldmeta_events=true | prestats count
[ AND [ OR [ AND sourcetype::audittrailv2* [ OR authn category::authn ] ] [ AND action attempt login ] ] ]tstats
litsearch (index=_audit ((index=_audit sourcetype=audittrailv2* category=authn) OR (index=_audit "action=login attempt" NOT "action=search")) (NOT action=success OR NOT user=*$) (index=* OR index=_*)) DIRECTIVES(REQUIRED_TAGS(intersect="t" tags="cleartext,cloud,default,insecure,multifactor,pci,privileged")) | addinfo type=count label=prereport_events track_fieldmeta_events=true | prestats count
[ AND [ OR [ AND sourcetype::audittrailv2* [ OR authn category::authn ] ] [ AND action attempt login ] ] ]As expected, they are the same. The actual searches will vary by configuration, hot bucket state, etc.
I'm testing with Splunk Enterprise 10.2, and results in earlier versions of Splunk may differ. When in doubt, run semantically identical datamodel and tstats searches and compare the execution plans and search logs in the job inspector.
For a deeper dive, explore archived .conf content. In Google:
site:conf.splunk.com tstats datamodel