<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Join operation query optimization in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326385#M4639</link>
    <description>&lt;P&gt;just a quick idea, i'm not sure if they're applicable or not for your data.&lt;/P&gt;

&lt;P&gt;Can you do: &lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" XYZ=DEF (api_name=API1 OR api_name=API2 )| stats first(_time) as _time, values(PQR) AS PQR by COMMONID | fillnull PQR value="NULL" | timechart span=1d count by PQR&lt;/CODE&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 21 Jul 2017 12:34:45 GMT</pubDate>
    <dc:creator>cmerriman</dc:creator>
    <dc:date>2017-07-21T12:34:45Z</dc:date>
    <item>
      <title>Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326383#M4637</link>
      <description>&lt;P&gt;I am dealing with a massive dataset, i need to perform a join operation on a commonid. I have tried multiple variations of the query but i am facing issues with each of them. It would be great if anyone can provide a solution(Modifying the configuration files is not an option).&lt;/P&gt;

&lt;P&gt;Query Version 1: This query is taking up too much of memory and the query is auto-finalizing after reaching 1GB data limit, consequently the results are inaccurate.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| search XYZ=DEF
| timechart span=1d count by PQR
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Query Version 2: This query appears to be more efficient, but I am getting "[subsearch]: Search auto-finalized after time limit (60 seconds) reached" and consequently i am not able to obtain the accurate results.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
[search index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice"  (api_name=API1  OR api_name=API2 ) 
   | search XYZ=DEF | fields COMMONID
   | dedup COMMONID]  
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| timechart span=1d count by PQR
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Query Version 3: This query will probably give accurate results, but it is very inefficient will take about 20 hours to finish execution.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice"  (api_name=API1  OR api_name=API2 )
| search (PQR=* OR NOT PQR=*)
| join COMMONID
   [search index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice"  (api_name=API1  OR api_name=API2 )
   | search XYZ=DEF ]
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID 
| fillnull PQR value="NULL" 
| timechart span=1d count by PQR
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 21 Jul 2017 01:57:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326383#M4637</guid>
      <dc:creator>tareddy</dc:creator>
      <dc:date>2017-07-21T01:57:06Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326384#M4638</link>
      <description>&lt;P&gt;Do you have access to doing summary searches? If so, it appears that you could use that method to break up your search into smaller, already summarized chunks of data. That way your end result would take FAR less time to produce at search time.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jul 2017 12:23:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326384#M4638</guid>
      <dc:creator>cpetterborg</dc:creator>
      <dc:date>2017-07-21T12:23:55Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326385#M4639</link>
      <description>&lt;P&gt;just a quick idea, i'm not sure if they're applicable or not for your data.&lt;/P&gt;

&lt;P&gt;Can you do: &lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" XYZ=DEF (api_name=API1 OR api_name=API2 )| stats first(_time) as _time, values(PQR) AS PQR by COMMONID | fillnull PQR value="NULL" | timechart span=1d count by PQR&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jul 2017 12:34:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326385#M4639</guid>
      <dc:creator>cmerriman</dc:creator>
      <dc:date>2017-07-21T12:34:45Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326386#M4640</link>
      <description>&lt;P&gt;1) In query 2 replace &lt;CODE&gt;| fields COMMONID | dedup COMMONID&lt;/CODE&gt; with &lt;CODE&gt;|stats by COMMONID&lt;/CODE&gt; for a slight boost.&lt;/P&gt;

&lt;P&gt;2) You could break query 2 down into two pieces and have piece 1 output a lookup to be used by piece 2.  That way, the combined searches have a much higher total limit.&lt;/P&gt;

&lt;P&gt;First half...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" 
    (api_name=API1 OR api_name=API2 ) 
| search XYZ=DEF 
| stats by COMMONID 
| outputlookup myCommonIds.csv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Second half...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" 
    (api_name=API1 OR api_name=API2 ) 
    [| inputlookup append=t myCommonIds.csv] 
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID 
| fillnull PQR value="NULL"
| timechart span=30m count by PQR
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;3) Since you're not doing anything to the records before the stats command, you could break the search down even further with &lt;CODE&gt;multisearch&lt;/CODE&gt;...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| multisearch 
    [ search index=INDEXA earliest=-7d@d latest=-5d@d sourcetype=GHI "service=randomservice" 
          (api_name=API1 OR api_name=API2 )   
          [| inputlookup append=t myCommonIds.csv] 
    ]
    [ search index=INDEXA earliest=-5d@d latest=-3d@d sourcetype=GHI "service=randomservice" 
          (api_name=API1 OR api_name=API2 )   
          [| inputlookup append=t myCommonIds.csv] 
     ]
     [  search index=INDEXA earliest=-3d@d latest=@d  sourcetype=GHI "service=randomservice" 
            (api_name=API1 OR api_name=API2 )  
            [| inputlookup append=t myCommonIds.csv] 
     ]
 | stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID 
 | fillnull PQR value="NULL"
 | timechart span=30m count by PQR
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 21 Jul 2017 21:09:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326386#M4640</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-07-21T21:09:40Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326387#M4641</link>
      <description>&lt;P&gt;I don't get &lt;CODE&gt;| search (PQR=* OR NOT PQR=*)&lt;/CODE&gt;; it does absolutely nothing so why is it there?&lt;/P&gt;</description>
      <pubDate>Sun, 23 Jul 2017 01:58:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326387#M4641</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-07-23T01:58:49Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326388#M4642</link>
      <description>&lt;P&gt;I want all the values of PQR to be considered (including NULL values), you're probably right though, i guess it should work even without "| search (PQR=* OR NOT PQR=*)"&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jul 2017 13:08:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326388#M4642</guid>
      <dc:creator>tareddy</dc:creator>
      <dc:date>2017-07-24T13:08:57Z</dc:date>
    </item>
    <item>
      <title>Re: Join operation query optimization</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326389#M4643</link>
      <description>&lt;P&gt;If that part disappears then the search simplifies.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jul 2017 13:41:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Join-operation-query-optimization/m-p/326389#M4643</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-07-24T13:41:08Z</dc:date>
    </item>
  </channel>
</rss>

