<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: set diff is very slow when match 10 billion in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203826#M98769</link>
    <description>&lt;P&gt;hi, 10billion seems a very huge number. are you sure?.. also did you check the set diff configurations on the limits.conf file ah..please update us..&lt;/P&gt;</description>
    <pubDate>Thu, 04 Aug 2016 07:57:53 GMT</pubDate>
    <dc:creator>inventsekar</dc:creator>
    <dc:date>2016-08-04T07:57:53Z</dc:date>
    <item>
      <title>set diff is very slow when match 10 billion</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203824#M98767</link>
      <description>&lt;P&gt;set diff is very slow when match 10 billion&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=/var/log/remote/192.168.1.1.log set diff [search "Built inbound" NOT "8.8.8.8" NOT "8.8.4.4" | rex field=_raw "Outside:(?&amp;lt;destinationip2&amp;gt;\d+.\d+.\d+.\d+){0,3}"  | rex field=_raw "Inside:(?&amp;lt;sourceip2&amp;gt;\d+.\d+.\d+.\d+){0,3}"] [search "Built outbound" outsideip=* | rex field=_raw "Outside:(?&amp;lt;destinationip2&amp;gt;\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?&amp;lt;sourceip2&amp;gt;\d+.\d+.\d+.\d+){0,3}"]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;format of message:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Aug  3 17:08:58 192.168.3.10 %ASA-6-302013: Built inbound TCP connection 434619881 for Outside:192.168.1.2/50978 (192.168.20.18/590) to Inside:192.168.22.20/443 (192.168.26.5/443)

Aug  3 17:09:15 192.168.3.18 %ASA-6-302013: Built outbound TCP connection 434622811 for Outside:192.168.18/.10/183 (192.168.18.1/1885) to Inside:202.171.21.16/53576 (230.180.220.1/5356)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 03 Aug 2016 09:11:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203824#M98767</guid>
      <dc:creator>cyberportnoc</dc:creator>
      <dc:date>2016-08-03T09:11:46Z</dc:date>
    </item>
    <item>
      <title>Re: set diff is very slow when match 10 billion</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203825#M98768</link>
      <description>&lt;P&gt;10 billion !!!... i think, you may need to edit the query timelines and do multiple queries.&lt;/P&gt;

&lt;P&gt;with default values for set command, it wont return 10 billion. &lt;BR /&gt;
&lt;A href="http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Set"&gt;http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Set&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Output limitations&lt;BR /&gt;
There is a limit on the quantity of results that come out of the invoked subsearches that the set command receives to operate on. If this limit is exceeded, the input result set to the diff command is &lt;STRONG&gt;silently truncated.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;If you have Splunk Enterprise, you can adjust this limit by editing the limits.conf file and changing the maxout value in the subsearch stanza. If this value is altered, the default quantity of results coming from a variety of subsearch scenarios are altered. Note that very large values might cause extensive stalls during the 'parsing' phase of a search, which is when subsearches run. &lt;STRONG&gt;The default value for this limit is 10000.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Result rows limitations&lt;BR /&gt;
By default the set command attempts to traverse a maximum of 50000 items from each subsearch. If the number of input results from either search exceeds this limit, the set command silently ignores the remaining events. By default, the maxout setting for subsearches prevents the number of results from exceeding this limit.&lt;/P&gt;

&lt;P&gt;If you have Splunk Enterprise, you can change this limit by editing the maxresultrows setting in the set stanza in the &lt;STRONG&gt;limits.conf&lt;/STRONG&gt; file.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 10:33:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203825#M98768</guid>
      <dc:creator>inventsekar</dc:creator>
      <dc:date>2016-08-03T10:33:17Z</dc:date>
    </item>
    <item>
      <title>Re: set diff is very slow when match 10 billion</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203826#M98769</link>
      <description>&lt;P&gt;hi, 10billion seems a very huge number. are you sure?.. also did you check the set diff configurations on the limits.conf file ah..please update us..&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 07:57:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203826#M98769</guid>
      <dc:creator>inventsekar</dc:creator>
      <dc:date>2016-08-04T07:57:53Z</dc:date>
    </item>
    <item>
      <title>Re: set diff is very slow when match 10 billion</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203827#M98770</link>
      <description>&lt;P&gt;What are you actually trying to compare? Seems like you're trying to find unique combination of destinationip2 and sourceip2 (not common between those two type of events). Firstly, 10 Billion records are too much for comparison, second, you're not reducing the no of fields to be compared (right now it's comparing all the fields from those two events). &lt;/P&gt;

&lt;P&gt;If my understanding is correct (about your requirement) , give this a try&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; source=/var/log/remote/192.168.1.1.log  ("Built inbound" NOT "8.8.8.8" NOT "8.8.4.4")  OR ("Built outbound" outsideip=*) | rex field=_raw "Outside:(?&amp;lt;destinationip2&amp;gt;\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?&amp;lt;sourceip2&amp;gt;\d+.\d+.\d+.\d+){0,3}" | eval type=if(match(_raw,"Build inbound"),1,2) | stats sum(type) as type by destinationip2 sourceip2 | where type&amp;lt;3 | table destinationip2 sourceip2
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 04 Aug 2016 14:29:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203827#M98770</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2016-08-04T14:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: set diff is very slow when match 10 billion</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203828#M98771</link>
      <description>&lt;P&gt;actually i joined with destinationip2 before and succeed, and would like to see the log which are not belonged to inner join&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 05:11:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/set-diff-is-very-slow-when-match-10-billion/m-p/203828#M98771</guid>
      <dc:creator>cyberportnoc</dc:creator>
      <dc:date>2016-08-05T05:11:48Z</dc:date>
    </item>
  </channel>
</rss>

