<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best practice for working with large dataset in Knowledge Management</title>
    <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367751#M3124</link>
    <description>&lt;P&gt;But in Settings&amp;gt; Data Models it says completed: 100%. That's strange then, right?&lt;/P&gt;</description>
    <pubDate>Mon, 15 May 2017 12:20:09 GMT</pubDate>
    <dc:creator>mblauw</dc:creator>
    <dc:date>2017-05-15T12:20:09Z</dc:date>
    <item>
      <title>Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367736#M3109</link>
      <description>&lt;P&gt;I've got a very large dataset which has got 50 M events each month. I've currently got 3 months indexed, so approx 150M events.&lt;/P&gt;

&lt;P&gt;Now, when I try to build up an accelerated report it still has got nearly 2M events. This is the least possible as I still need to use quite a lot of files that have unique combinations in the data.&lt;/P&gt;

&lt;P&gt;What would be best practice to build up a search on this data? Searches have to be over all-time data.&lt;/P&gt;

&lt;P&gt;We also have built this in ES. Here, it only takes about 1 minute to show a result over the full 3 months worth of data.&lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 14:07:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367736#M3109</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-08T14:07:02Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367737#M3110</link>
      <description>&lt;P&gt;If it is possible, rollup the distinctives into whatever breakout timespans you need, perhaps one hourly, one daily and one monthly and put them into a summary index.  Then you can pull out of that instead of the raw data.&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 00:48:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367737#M3110</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-05-09T00:48:00Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367738#M3111</link>
      <description>&lt;P&gt;You mentioned an accelerated report. Did you also try creating a Data Model or Data Set and then accelerating that?&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 13:12:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367738#M3111</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-09T13:12:11Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367739#M3112</link>
      <description>&lt;P&gt;No I did not yet tried to attempt that. Has those got way better performance over large datasets?&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 13:13:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367739#M3112</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-09T13:13:49Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367740#M3113</link>
      <description>&lt;P&gt;Absolutely! Search Acceleration is great for very specific searches while Datasets are more malleable and allow for a wider variety of data analysis against the data and you can even accelerate them as well!&lt;/P&gt;

&lt;P&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatasets"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatasets&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2017 12:21:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367740#M3113</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-10T12:21:00Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367741#M3114</link>
      <description>&lt;P&gt;Unfortunately this also did not had the desired result in terms of search speed.. Now trying to build up a summary index, see how well that goes.&lt;/P&gt;

&lt;P&gt;Thank you for your input!&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2017 13:00:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367741#M3114</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-10T13:00:39Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367742#M3115</link>
      <description>&lt;P&gt;In terms of search speed, can you confirm that you accelerated the data model after creating it? Between that and the 'tstats' command, you should see an amazing different. You would see no difference if no such acceleration was turned on.&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2017 21:12:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367742#M3115</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-10T21:12:53Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367743#M3116</link>
      <description>&lt;P&gt;Thank you! I found out I only had an accelerated dataset, not an accelerated datamodel. It is building atm. Will see how that goes. &lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2017 08:20:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367743#M3116</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-11T08:20:50Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367744#M3117</link>
      <description>&lt;P&gt;I've build the datamodel. It shows the following and the tstats command is not working...&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;ACCELERATION Rebuild  Update  Edit    &lt;/P&gt;

&lt;P&gt;Status 100.00% Completed&lt;BR /&gt;&lt;BR /&gt;
Access Count 4. &lt;BR /&gt;
Last Access: 5/12/17 12:11:53.000 PM&lt;BR /&gt;
Size on Disk 3059.04 MB&lt;BR /&gt;
Summary Range 0 second(s)&lt;BR /&gt;
Buckets  2&lt;BR /&gt;
Updated  5/12/17 12:01:46.000 PM&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Fri, 12 May 2017 10:52:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367744#M3117</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-12T10:52:51Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367745#M3118</link>
      <description>&lt;P&gt;When you pivot on the dataset does it work faster than before? Also, show us what of &lt;CODE&gt;tstats&lt;/CODE&gt; is not working - it's likely just syntax related.&lt;/P&gt;</description>
      <pubDate>Fri, 12 May 2017 12:32:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367745#M3118</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-12T12:32:56Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367746#M3119</link>
      <description>&lt;P&gt;Will try that tomorrow. I'm now rebuilding the datamodel acceleration. Hoping that will fix some things..&lt;/P&gt;

&lt;P&gt;This is the query I used:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats avg(t_0_10s) FROM datamodel=ndw_acc_datamodel
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've got a datamodel called ndw_acc_datamodel, wherein a dataset (root event) lives which is called ndw_acc_datamodel_set1&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 14:03:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367746#M3119</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2020-09-29T14:03:00Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367747#M3120</link>
      <description>&lt;P&gt;Pivot is incredibly fast up untill 113M events, the last 50M events are very slow. The time range for acceleration is set to all time.&lt;/P&gt;</description>
      <pubDate>Fri, 12 May 2017 14:00:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367747#M3120</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-12T14:00:31Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367748#M3121</link>
      <description>&lt;P&gt;I see that I received this error message. Does anybody know what this means?&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Audit:[timestamp=05-12-2017&lt;BR /&gt;
16:40:42.547, user=splunk-system-user,&lt;BR /&gt;
action=search, info=failed,&lt;BR /&gt;
search_id='SummaryDirector_1494600038.3739', total_run_time=0.15, event_count=0,&lt;BR /&gt;
result_count=0, available_count=0,&lt;BR /&gt;
scan_count=0, drop_count=0,&lt;BR /&gt;
exec_time=1494600038, api_et=N/A,&lt;BR /&gt;
api_lt=N/A, search_et=N/A,&lt;BR /&gt;
search_lt=N/A, is_realtime=0,&lt;BR /&gt;
savedsearch_name="",&lt;BR /&gt;
search_startup_time="0",&lt;BR /&gt;
searched_buckets=0,&lt;BR /&gt;
eliminated_buckets=0,&lt;BR /&gt;
considered_events=0, total_slices=0,&lt;BR /&gt;
decompressed_slices=0][n/a]&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Tue, 29 Sep 2020 14:04:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367748#M3121</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2020-09-29T14:04:23Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367749#M3122</link>
      <description>&lt;P&gt;I wonder if the last 50m events were just not yet accelerated.&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 12:17:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367749#M3122</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-15T12:17:51Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367750#M3123</link>
      <description>&lt;P&gt;I think the SummaryDirector items are the autogenerated accelerations. Other than 'info=failed' I'm not seeing an error message. Where did this arise? If its the acceleration then I think Splunk will self-correct.&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 12:19:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367750#M3123</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-15T12:19:57Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367751#M3124</link>
      <description>&lt;P&gt;But in Settings&amp;gt; Data Models it says completed: 100%. That's strange then, right?&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 12:20:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367751#M3124</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-15T12:20:09Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367752#M3125</link>
      <description>&lt;P&gt;What of this tstats command is not working - is it showing an error message or producing no results? &lt;/P&gt;

&lt;P&gt;Try &lt;CODE&gt;| tstats values FROM datamodel=ndw_acc_datamodel&lt;/CODE&gt; to validate &lt;CODE&gt;t_0_10s&lt;/CODE&gt; is the right field name.&lt;/P&gt;

&lt;P&gt;According to the docs, you can use the &lt;CODE&gt;summariesonly&lt;/CODE&gt; flag to restrict it to only items accelerated thus far (if you desire). Also, the &lt;CODE&gt;prestats&lt;/CODE&gt; will return the data like if you had used the summary indexing commands - ready for other stats after with more of the original summarization details.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Tstats"&gt;http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Tstats&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 12:24:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367752#M3125</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-15T12:24:50Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367753#M3126</link>
      <description>&lt;P&gt;Your query returns no results..&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 13:19:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367753#M3126</guid>
      <dc:creator>mblauw</dc:creator>
      <dc:date>2017-05-15T13:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for working with large dataset</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367754#M3127</link>
      <description>&lt;P&gt;"Pivot is incredibly fast up untill 113M events, the last 50M events are very slow." &amp;lt;- is that still the case? Does the Job Inspector tell you anything? I'd try similar searches with tstats using the summariesonly field true and false to see if you can pinpoint more of that 50m part. Also, did you mean the most recent or the earliest (relative to _time) when you mentioned "last"? And does that happen consistently or just that one time? If just that one time then it could have been load on the indexer.&lt;/P&gt;</description>
      <pubDate>Tue, 16 May 2017 12:25:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-practice-for-working-with-large-dataset/m-p/367754#M3127</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2017-05-16T12:25:11Z</dc:date>
    </item>
  </channel>
</rss>

