I am super stoked about the potential of Schema Accelerated Event Searches- might be one of the best improvements i've seen if i could actually get it to work- but it doesn't. 😞
Don't focus on the fact that i'm only returning the count of events... performance doesn't differ if i returned the raw events (which is ultimately what i want to do).... i'm just doing the count so i can make an apples-to-apples comparison.
So consider the following two searches over 15 minutes of data:
SEARCH # 1
|tstats summariesonly=true count from datamodel="Web" where Web.user="dmerritt"
The value returned was 25. The search itself took 2.676 seconds
SEARCH # 2
|from datamodel Web|search user=dmerritt|stats count
The value returned was 106. The search itself took 2 minutes, 14 seconds.
QUESTIONS:
1) Why the HUGE difference in performance?
2) Why is the result count different?
NOTE : Am running Splunk 7.1.5
The reason you're seeing count and perf differences is because | from
and | datamodel
are running in "mixed mode" searching by default (and is the only option in 7.1). There were plans to add summariesonly
option to | datamodel
; however, it appears that hasn't been added ( allow_old_summaries
does look like it was added in 7.2). You're likely to see a count difference between tstats summariesonly=t
and | (from|datamodel)
searches due to this (since the latter will search the hot buckets for new events that have yet to be summarized). To get an apples-to-apples comparison on performance, try |from datamodel Web|search user=dmerritt| noop directive.read_summary=f
against |from datamodel Web|search user=dmerritt
. That noop
command should disable Schema Accelerated Event Search.
As for only datamodel-defined fields appearing in these searches. This was the original design of the | datamodel
command; however, somewhere along the way, this broke and all fields were being returned. In order for us to implement Schema Accelerated Event Search, we had to fix this bug since only the fields defined within the data model are stored within the accelerated index and leaving this bug hanging around broke the implementation.
Fist of all, I wouldn't use | from datamodel
because it was recently broken and no longer returns all fields (only the ones in the datamodel). Instead use the macro
described here:
https://answers.splunk.com/answers/716936/splunk-server-field-is-not-available-when-we-searc.html#an...
Then do this:
`SIEMMacro_datamodelCIM(Web, Web)` user="dmerritt" | stats count
Or possibly this:
`SIEMMacro_datamodelCIM(Web, Web)` TERM(user=dmerritt) | stats count
Notice that there is no pipe ( |
) before the | stats
; that is why this macro
makes these searches way faster.
Now, the non-tstats search returns fewer results because the data model acceleration (DMA) will always run behind, usually for less than 5 minutes. This is why you often see tstats
searches with Time picker
values of earliest=-65m latest=-5m
. So for a test, run all the searches for a full day back by adding this to each search earliest=-1d@d latest -1d@d+1h
and you should get the same result from every search.
The huge difference in performance is because the tstats
command is getting the results from a metadata index that summarizes the raw data and does not have to unzip the raw data ( journal.gz
) files to get the answers.
To see that I am right, swap the boolean on summariesonly
like this:
|tstats summariesonly=false count from datamodel="Web" where Web.user="dmerritt"
You will see that it returns all of the results, but is much slower.
P.S. If this is the A.Morris that I think that it is, I emailed Daneil about this macro months ago.
This is something slightly different although i'll give you a nod that the "|from datamodel" appears terribly broken. Here's the background... i was talking with a Splunk employee who was lauding the recent benefits in Splunk. Specifically, he said that the data models now include a "hidden" pointer back to the actual raw event. This means you can search a data model to get the speed benefits of accelerated data models BUT your search can now return the FULL raw event- not just the data contained within the data model. Clearly this is SUPER useful because this opens a world of new possibilities. The obvious limitation is that the initial search constraint must be in the data model itself. It is also worth noting this same feature was mentioned by David Veuve in his Security Ninjitsu preso @ .conf2018.
The problem is that it doesn't work as advertised. 😞
Do tell! How is this pointer accessed?
Note that you can add a | extract
after | from datamodel
:and you will get fields that are not in the datamodel!
Can you provide an example? I tested and my experience differs. I thought extract simply broke apart key/value pairs.
Just like this e.g:
| from datamodel:Authentication
| extract
vs.
index=* source="XmlWineventlog:Security" tag=authentication NOT (user=*$ action=success )
The number of fields will not be the same, as extract
does not add field aliases. Compared this with fieldsummary
.
It depends if they are encoded in _raw
. Sometimes they are not.