<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Schema Accelerated Event Search performance in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425575#M172663</link>
    <description>&lt;P&gt;This is something slightly different although i'll give you a nod that the "|from datamodel" appears terribly broken.  Here's the background... i was talking with a Splunk employee who was lauding the recent benefits in Splunk.  Specifically, he said that the data models now include a "hidden" pointer back to the actual raw event.  This means you can search a data model to get the speed benefits of accelerated data models &lt;EM&gt;BUT&lt;/EM&gt; your search can now return the FULL raw event- not just the data contained within the data model.  Clearly this is SUPER useful because this opens a world of new possibilities.  The obvious limitation is that the initial search constraint must be in the data model itself.  It is also worth noting this same feature was mentioned by David Veuve in his Security Ninjitsu preso @ .conf2018.   &lt;/P&gt;

&lt;P&gt;The problem is that it doesn't work as advertised.  &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 17 Mar 2019 03:59:20 GMT</pubDate>
    <dc:creator>awmorris</dc:creator>
    <dc:date>2019-03-17T03:59:20Z</dc:date>
    <item>
      <title>Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425573#M172661</link>
      <description>&lt;P&gt;I am super stoked about the potential of Schema Accelerated Event Searches- might be one of the best improvements i've seen if i could actually get it to work- but it doesn't.  &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Don't focus on the fact that i'm only returning the count of events... performance doesn't differ if i returned the raw events (which is ultimately what i want to do).... i'm just doing the count so i can make an apples-to-apples comparison. &lt;/P&gt;

&lt;P&gt;So consider the following two searches over 15 minutes of data:&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;SEARCH # 1&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|tstats summariesonly=true count from datamodel="Web" where Web.user="dmerritt" 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The value returned was 25.  The search itself took 2.676 seconds&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;SEARCH # 2&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|from datamodel Web|search user=dmerritt|stats count
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The value returned was 106.  The search itself took 2 minutes, 14 seconds.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;QUESTIONS:&lt;/STRONG&gt;&lt;BR /&gt;
1) Why the HUGE difference in performance?&lt;BR /&gt;
2) Why is the result count different?&lt;/P&gt;

&lt;P&gt;NOTE : Am running Splunk 7.1.5&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2019 16:30:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425573#M172661</guid>
      <dc:creator>awmorris</dc:creator>
      <dc:date>2019-03-07T16:30:00Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425574#M172662</link>
      <description>&lt;P&gt;Fist of all, I wouldn't use &lt;CODE&gt;| from datamodel&lt;/CODE&gt; because it was recently broken and no longer returns all fields (only the ones in the datamodel).  Instead use the &lt;CODE&gt;macro&lt;/CODE&gt; described here:&lt;BR /&gt;
&lt;A href="https://answers.splunk.com/answers/716936/splunk-server-field-is-not-available-when-we-searc.html#answer-717058"&gt;https://answers.splunk.com/answers/716936/splunk-server-field-is-not-available-when-we-searc.html#answer-717058&lt;/A&gt;&lt;BR /&gt;
Then do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`SIEMMacro_datamodelCIM(Web, Web)` user="dmerritt" | stats count
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Or possibly this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`SIEMMacro_datamodelCIM(Web, Web)` TERM(user=dmerritt) | stats count
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Notice that there is no pipe ( &lt;CODE&gt;|&lt;/CODE&gt; ) before the &lt;CODE&gt;| stats&lt;/CODE&gt;; that is why this &lt;CODE&gt;macro&lt;/CODE&gt; makes these searches way faster.&lt;/P&gt;

&lt;P&gt;Now, the non-tstats search returns fewer results because the data model acceleration (DMA) will always run behind, usually for less than 5 minutes.  This is why you often see &lt;CODE&gt;tstats&lt;/CODE&gt; searches with &lt;CODE&gt;Time picker&lt;/CODE&gt; values of &lt;CODE&gt;earliest=-65m latest=-5m&lt;/CODE&gt;.  So for a test, run all the searches for a full day back by adding this to each search &lt;CODE&gt;earliest=-1d@d latest -1d@d+1h&lt;/CODE&gt; and you should get the same result from every search.&lt;/P&gt;

&lt;P&gt;The huge difference in performance is because the &lt;CODE&gt;tstats&lt;/CODE&gt; command is getting the results from a metadata index that summarizes the raw data and does not have to unzip the raw data ( &lt;CODE&gt;journal.gz&lt;/CODE&gt; ) files to get the answers.&lt;/P&gt;

&lt;P&gt;To see that I am right, swap the boolean on &lt;CODE&gt;summariesonly&lt;/CODE&gt; like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|tstats summariesonly=false count from datamodel="Web" where Web.user="dmerritt" 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You will see that it returns all of the results, but is much slower.&lt;/P&gt;

&lt;P&gt;P.S.  If this is the A.Morris that I think that it is, I emailed Daneil about this macro months ago.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Mar 2019 03:02:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425574#M172662</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-03-08T03:02:09Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425575#M172663</link>
      <description>&lt;P&gt;This is something slightly different although i'll give you a nod that the "|from datamodel" appears terribly broken.  Here's the background... i was talking with a Splunk employee who was lauding the recent benefits in Splunk.  Specifically, he said that the data models now include a "hidden" pointer back to the actual raw event.  This means you can search a data model to get the speed benefits of accelerated data models &lt;EM&gt;BUT&lt;/EM&gt; your search can now return the FULL raw event- not just the data contained within the data model.  Clearly this is SUPER useful because this opens a world of new possibilities.  The obvious limitation is that the initial search constraint must be in the data model itself.  It is also worth noting this same feature was mentioned by David Veuve in his Security Ninjitsu preso @ .conf2018.   &lt;/P&gt;

&lt;P&gt;The problem is that it doesn't work as advertised.  &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Mar 2019 03:59:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425575#M172663</guid>
      <dc:creator>awmorris</dc:creator>
      <dc:date>2019-03-17T03:59:20Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425576#M172664</link>
      <description>&lt;P&gt;Do tell! How is this pointer accessed?&lt;/P&gt;</description>
      <pubDate>Sun, 17 Mar 2019 17:24:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425576#M172664</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-03-17T17:24:29Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425577#M172665</link>
      <description>&lt;P&gt;The reason you're seeing count and perf differences is because &lt;CODE&gt;| from&lt;/CODE&gt; and &lt;CODE&gt;| datamodel&lt;/CODE&gt; are running in "mixed mode" searching by default (and is the only option in 7.1). There were plans to add &lt;CODE&gt;summariesonly&lt;/CODE&gt; option to &lt;CODE&gt;| datamodel&lt;/CODE&gt;; however, it appears that hasn't been added ( &lt;CODE&gt;allow_old_summaries&lt;/CODE&gt; does look like it was added in 7.2). You're likely to see a count difference between &lt;CODE&gt;tstats summariesonly=t&lt;/CODE&gt; and &lt;CODE&gt;| (from|datamodel)&lt;/CODE&gt; searches due to this (since the latter will search the hot buckets for new events that have yet to be summarized).  To get an apples-to-apples comparison on performance, try &lt;CODE&gt;|from datamodel Web|search user=dmerritt|  noop directive.read_summary=f&lt;/CODE&gt; against &lt;CODE&gt;|from datamodel Web|search user=dmerritt&lt;/CODE&gt;. That &lt;CODE&gt;noop&lt;/CODE&gt; command should disable Schema Accelerated Event Search.&lt;/P&gt;

&lt;P&gt;As for only datamodel-defined fields appearing in these searches. This was the original design of the &lt;CODE&gt;| datamodel&lt;/CODE&gt; command; however, somewhere along the way, this broke and all fields were being returned. In order for us to implement Schema Accelerated Event Search, we had to fix this bug since only the fields defined within the data model are stored within the accelerated index and leaving this bug hanging around broke the implementation.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Mar 2019 21:47:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425577#M172665</guid>
      <dc:creator>nick_cribl</dc:creator>
      <dc:date>2019-03-17T21:47:45Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425578#M172666</link>
      <description>&lt;P&gt;Note that you can add a &lt;CODE&gt;| extract&lt;/CODE&gt; after &lt;CODE&gt;| from datamodel&lt;/CODE&gt;:and you will get fields that are not in the datamodel!&lt;/P&gt;</description>
      <pubDate>Fri, 22 Mar 2019 13:54:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425578#M172666</guid>
      <dc:creator>my2ndhead</dc:creator>
      <dc:date>2019-03-22T13:54:35Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425579#M172667</link>
      <description>&lt;P&gt;Can you provide an example?  I tested and my experience differs.  I thought extract simply broke apart key/value pairs.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Mar 2019 16:30:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425579#M172667</guid>
      <dc:creator>awmorris</dc:creator>
      <dc:date>2019-03-22T16:30:38Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425580#M172668</link>
      <description>&lt;P&gt;It depends if they are encoded in &lt;CODE&gt;_raw&lt;/CODE&gt;.  Sometimes they are not.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Mar 2019 16:49:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425580#M172668</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-03-22T16:49:36Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Accelerated Event Search performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425581#M172669</link>
      <description>&lt;P&gt;Just like this e.g:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| from datamodel:Authentication 
| extract
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;vs.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=* source="XmlWineventlog:Security" tag=authentication  NOT (user=*$ action=success )
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The number of fields will not be the same, as &lt;CODE&gt;extract&lt;/CODE&gt; does not add field aliases. Compared this with &lt;CODE&gt;fieldsummary&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Mar 2019 20:16:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Schema-Accelerated-Event-Search-performance/m-p/425581#M172669</guid>
      <dc:creator>my2ndhead</dc:creator>
      <dc:date>2019-03-25T20:16:28Z</dc:date>
    </item>
  </channel>
</rss>

