<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is there a more efficient solution than my current search to validate whether data from two different sources is the same? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240794#M71581</link>
    <description>&lt;P&gt;Just run the search and peel back the layers to see how it works (or to fix something that might be wrong if it doesn't give you what you expect).&lt;/P&gt;</description>
    <pubDate>Fri, 08 Jul 2016 01:51:28 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2016-07-08T01:51:28Z</dc:date>
    <item>
      <title>Is there a more efficient solution than my current search to validate whether data from two different sources is the same?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240791#M71578</link>
      <description>&lt;P&gt;I am trying to validate whether data from two separate sources is the same. I have indexed two csv files of 450,000+ records in two separate indexes and am trying to compare now. Currently am joining on a unique field in both tables and matching on certain field values available in the data. Having a problem that my data set is too large for the subsearch (limit is 50,000) to handle and I cannot access limits.conf to increase the subsearch limit. Can someone help me with an alternative and hopefully more efficient solution? My goal is to show the records that are different and the values within that differ between the two tables This is the type of search I am currently using shortened for the use of this question: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="source1" sourcetype=type1  | 

eval unique = uniqueField | 

stats count by ID, unique, field1, field2, field3, field4 |

sort 0 ID, field2| 

eval allFields1 = field1.field2.field3.field4|

eval a = field1 |eval b = field2| eval c = field3 | eval d = field4 |

join unique   [search index="source2" sourcetype=type2 | 

eval unique= uniqueField |

stats count by ID, unique, field1, field2, field3, field4 |

sort 0 ID, field2 |

eval allFields2 = field1.field2.field3.field4 |

eval a2 = field1 |eval b2 = field2| eval c2 = field3 | eval d2 = field4  ]|

eval fieldA=if(a!=a2,"Table 1: '"+a+"' | Table 2: '"+a2+"'", "No differences") |
eval fieldB=if(b!=b2,"Table 1: '"+b+"' | Table 2: '"+b2+"'", "No differences") |
eval fieldC=if(c!=c2,"Table 1: '"+cr+"' | Table 2: '"+c2+"'", "No differences") |

where fields1!=fields2 |
table unique ID fieldA fieldB fieldC
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hopefully that's clear. Let me know if you need more information. &lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 19:08:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240791#M71578</guid>
      <dc:creator>khubyarb</dc:creator>
      <dc:date>2016-06-29T19:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a more efficient solution than my current search to validate whether data from two different sources is the same?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240792#M71579</link>
      <description>&lt;P&gt;Try this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index="source1" sourcetype=type1) OR (index="source2" sourcetype=type2)
| eval unique = uniqueField
| stats count BY sourcetype ID unique field1 field2 field3 field4
| sort 0 ID, field2
| eval allFields = field1.field2.field3.field4
| eval a = field1 |eval b = field2 | eval c = field3 | eval d = field4
| stats dc(*) AS dc_* values(*) AS * BY unique 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This creates distinct counts and multivalued fields so you can put in whatever comparison you need across the sourcetypes.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 20:04:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240792#M71579</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2016-06-29T20:04:05Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a more efficient solution than my current search to validate whether data from two different sources is the same?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240793#M71580</link>
      <description>&lt;P&gt;Thanks for the response. Sorry but I am rather new to Splunk and not too sure how to use this to compare over fields between the indexes as you mentioned that I should be able to do. &lt;/P&gt;</description>
      <pubDate>Thu, 07 Jul 2016 21:42:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240793#M71580</guid>
      <dc:creator>khubyarb</dc:creator>
      <dc:date>2016-07-07T21:42:06Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a more efficient solution than my current search to validate whether data from two different sources is the same?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240794#M71581</link>
      <description>&lt;P&gt;Just run the search and peel back the layers to see how it works (or to fix something that might be wrong if it doesn't give you what you expect).&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2016 01:51:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Is-there-a-more-efficient-solution-than-my-current-search-to/m-p/240794#M71581</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2016-07-08T01:51:28Z</dc:date>
    </item>
  </channel>
</rss>

