<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance very slow for index join on 200MB data only in Dashboards &amp; Visualizations</title>
    <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456980#M29986</link>
    <description>&lt;P&gt;Please share your search.&lt;BR /&gt;
Also, please explain why a dashboard needs to be real-time.  Is the number of students in a class changing so fast a 20-second query can't keep up?  If a human is processing the results of the query then real-time is not necessary.&lt;BR /&gt;&lt;BR /&gt;
Perhaps by "realtime" you mean "fast".  That's different and by seeing your query we may be able to suggest improvements.&lt;BR /&gt;
What is the size of the "very small" data volume?&lt;/P&gt;</description>
    <pubDate>Sat, 06 Jul 2019 12:33:45 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2019-07-06T12:33:45Z</dc:date>
    <item>
      <title>Performance very slow for index join on 200MB data only</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456979#M29985</link>
      <description>&lt;P&gt;Hi I've json data in one index containing student_id,grade_id,class_id . and another static dataset(csv) in another index which contains a the names based on ids. Say student_name,student_id,grade_name,grade_id.&lt;BR /&gt;
I need to show a dashboard in realtime which will update the count of students per grade per class , however the filter are on the names and not the ids. So I need to join these two datasets. The grade_id,class_id are inside nested json and are extracted at search time before they are joined with the static index. This query is taking 20seconds!!! too slow. What techniques can be done to make it under 2-3seconds since data volume is very small.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 01:11:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456979#M29985</guid>
      <dc:creator>jonu4u</dc:creator>
      <dc:date>2020-09-30T01:11:23Z</dc:date>
    </item>
    <item>
      <title>Re: Performance very slow for index join on 200MB data only</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456980#M29986</link>
      <description>&lt;P&gt;Please share your search.&lt;BR /&gt;
Also, please explain why a dashboard needs to be real-time.  Is the number of students in a class changing so fast a 20-second query can't keep up?  If a human is processing the results of the query then real-time is not necessary.&lt;BR /&gt;&lt;BR /&gt;
Perhaps by "realtime" you mean "fast".  That's different and by seeing your query we may be able to suggest improvements.&lt;BR /&gt;
What is the size of the "very small" data volume?&lt;/P&gt;</description>
      <pubDate>Sat, 06 Jul 2019 12:33:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456980#M29986</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2019-07-06T12:33:45Z</dc:date>
    </item>
    <item>
      <title>Re: Performance very slow for index join on 200MB data only</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456981#M29987</link>
      <description>&lt;P&gt;This is one of the main reasons you keep hearing us say &lt;CODE&gt;never use join&lt;/CODE&gt;; instead use &lt;CODE&gt;stats&lt;/CODE&gt; like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=&amp;lt;json data&amp;gt; AND sourcetype=&amp;lt;json data&amp;gt; OR (index=&amp;lt;student data&amp;gt; AND sourcetype=&amp;lt;student data&amp;gt;)
| stats values(*) AS * BY student_id grade_id
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then do your stuff from here.&lt;/P&gt;</description>
      <pubDate>Sat, 06 Jul 2019 19:01:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456981#M29987</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-07-06T19:01:04Z</dc:date>
    </item>
    <item>
      <title>Re: Performance very slow for index join on 200MB data only</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456982#M29988</link>
      <description>&lt;P&gt;Even better, I would use a scheduled search to turn the 2nd CSV dataset back into a lookup like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=&amp;lt;student data&amp;gt; AND sourcetype=&amp;lt;student data&amp;gt;)
| dedup student_name student_id grade_name grade_id
| table student_name student_id grade_name grade_id
| outputlookup MyStudentLookup.csv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=&amp;lt;json data&amp;gt; AND sourcetype=&amp;lt;json data&amp;gt;
| lookup MyStudentLookup.csv student_id grade_id
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 06 Jul 2019 19:08:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456982#M29988</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-07-06T19:08:11Z</dc:date>
    </item>
    <item>
      <title>Re: Performance very slow for index join on 200MB data only</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456983#M29989</link>
      <description>&lt;P&gt;So we need to monitor student activity. So any instant if a student is inactive for less than 5 seconds we see that. So we want it realtime.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jul 2019 12:25:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Performance-very-slow-for-index-join-on-200MB-data-only/m-p/456983#M29989</guid>
      <dc:creator>jonu4u</dc:creator>
      <dc:date>2019-07-07T12:25:53Z</dc:date>
    </item>
  </channel>
</rss>

