Getting Data In

set diff is very slow when match 10 billion

cyberportnoc
Explorer

set diff is very slow when match 10 billion

source=/var/log/remote/192.168.1.1.log set diff [search "Built inbound" NOT "8.8.8.8" NOT "8.8.4.4" | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}"  | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"] [search "Built outbound" outsideip=* | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"]

format of message:

Aug  3 17:08:58 192.168.3.10 %ASA-6-302013: Built inbound TCP connection 434619881 for Outside:192.168.1.2/50978 (192.168.20.18/590) to Inside:192.168.22.20/443 (192.168.26.5/443)

Aug  3 17:09:15 192.168.3.18 %ASA-6-302013: Built outbound TCP connection 434622811 for Outside:192.168.18/.10/183 (192.168.18.1/1885) to Inside:202.171.21.16/53576 (230.180.220.1/5356)
Tags (1)
0 Karma

somesoni2
Revered Legend

What are you actually trying to compare? Seems like you're trying to find unique combination of destinationip2 and sourceip2 (not common between those two type of events). Firstly, 10 Billion records are too much for comparison, second, you're not reducing the no of fields to be compared (right now it's comparing all the fields from those two events).

If my understanding is correct (about your requirement) , give this a try

 source=/var/log/remote/192.168.1.1.log  ("Built inbound" NOT "8.8.8.8" NOT "8.8.4.4")  OR ("Built outbound" outsideip=*) | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}" | eval type=if(match(_raw,"Build inbound"),1,2) | stats sum(type) as type by destinationip2 sourceip2 | where type<3 | table destinationip2 sourceip2
0 Karma

cyberportnoc
Explorer

actually i joined with destinationip2 before and succeed, and would like to see the log which are not belonged to inner join

0 Karma

inventsekar
SplunkTrust
SplunkTrust

hi, 10billion seems a very huge number. are you sure?.. also did you check the set diff configurations on the limits.conf file ah..please update us..

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

inventsekar
SplunkTrust
SplunkTrust

10 billion !!!... i think, you may need to edit the query timelines and do multiple queries.

with default values for set command, it wont return 10 billion.
http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Set

Output limitations
There is a limit on the quantity of results that come out of the invoked subsearches that the set command receives to operate on. If this limit is exceeded, the input result set to the diff command is silently truncated.

If you have Splunk Enterprise, you can adjust this limit by editing the limits.conf file and changing the maxout value in the subsearch stanza. If this value is altered, the default quantity of results coming from a variety of subsearch scenarios are altered. Note that very large values might cause extensive stalls during the 'parsing' phase of a search, which is when subsearches run. The default value for this limit is 10000.

Result rows limitations
By default the set command attempts to traverse a maximum of 50000 items from each subsearch. If the number of input results from either search exceeds this limit, the set command silently ignores the remaining events. By default, the maxout setting for subsearches prevents the number of results from exceeding this limit.

If you have Splunk Enterprise, you can change this limit by editing the maxresultrows setting in the set stanza in the limits.conf file.

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...