TLDR; does, | search(), operate differently in tstats, especially with wildcards, NOT, OR, AND, parentheses, etc.?
I'm dev/testing some queries with tstats and want to see if data modeling would make our current alerts more efficient.
To test, I view the spl of an alert we use, and implement the fields of that alert into the root search of the Endpoint data model that comes with CIM.
However, we have a lot of exclusions/filters in this alert. (e.g. ignoring certain Account_Names, New_Process_Names, Creator_Process_Names, etc.)
On the separate tstats query, I mimic most of everything else from the original alert, especially when the formatting of the tstats so that it mirrors the stats command from the original alert.
example:
| stats count, values(field1) as field1 by field2, field3
>>>>>>>>>>>>>>>
| tstats count,
values(Processes.field1) as field1
FROM datamodel=Endpoint.Processes
by Processes.field2, Processes.field3
Before I decide to accelerate the data model, I want to make sure the output of both the alert and tstats query are the same.
To control this, I set an arbitrary timeframe: earliest=-4h@h, latest=-2h@h and apply that to both queries.
In the tstats command, I do a pipe search ( | search () ) after the major tstats commands, and paste the exclusions/filters from the alert into that clause. It has a bunch of wildcards in it, for reasons I won't get into, and yes some of it is not great practice with leading wildcards, but the original alert works so its fine for now.
When I compare the output of both queries, the statistics is slightly off.
Even though the I apply the timeframe, and I notice if tailor the wildcards a bit, it somewhat closes the gap, but not consistently, especially as I increase the timeframe.
This leads me to believe that | search() treats certain characters differently with tstats, and I don't know why.
Hi @antoniolamonica,
Data model root search datasets start with a base search. Endpoint.Processes is:
(`cim_Endpoint_indexes`) tag=process tag=report
| eval process_integrity_level=lower(process_integrity_level)
This search is expanded by Splunk to include the contents of the cim_EndPoint_indexes macros and all event types that match tag=process and tag=report.
To compare like for like searches, start with the same base search:
(`cim_Endpoint_indexes`) tag=process tag=report earliest=-4h@h latest=-2h@h
| eval process_integrity_level=lower(process_integrity_level)
| stats count values(process_id) as process_id by dest
and construct a similar tstats search:
| tstats summariesonly=f count values(Processes.process_id) as process_id from datamodel=Endpoint.Processes where earliest=-4h@h latest=-2h@h by Processes.dest
The underlying searches should be similar. Optimization may vary. You can verify the SPL in the job inspector: Job > Inspect Job > search.log and the UnifiedSearch component's log output.
When summariesonly=f, the searches have similar behavior.
When summariesonly=t, the data model search only looks at indexed field values. This is similar to using field::value for indexed fields and TERM() for indexed terms in a normal search.