<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: find the duplicate files from particular source in splunk search query in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210029#M61424</link>
    <description>&lt;P&gt;Try this:&lt;/P&gt;

&lt;P&gt;index=main sourcetype=sampledata "TRL*" OR "Header*"&lt;BR /&gt;
| EVAL mytype=CASE(MATCH(_raw,"TRL"), "TRAILER",MATCH( _raw,"Header"), "HEADER")&lt;BR /&gt;
| chart count by source, mytype&lt;BR /&gt;
| SEARCH TRAILER&amp;gt;1 AND HEADER&amp;gt;1&lt;/P&gt;

&lt;P&gt;It should show you every source with duplicated "TRL" and "Header"&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 11:07:25 GMT</pubDate>
    <dc:creator>haley_swarnapat</dc:creator>
    <dc:date>2020-09-29T11:07:25Z</dc:date>
    <item>
      <title>find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210026#M61421</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;

&lt;P&gt;I need to find from particular source how many we have duplicate files in last 7 days. &lt;/P&gt;

&lt;P&gt;I have used  &lt;A href="https://answers.splunk.com/answers/451711/how-to-index-duplicate-files-which-has-different-n.html#answer-451713"&gt;this&lt;/A&gt; method to indexed duplicate files in Splunk.&lt;/P&gt;

&lt;P&gt;here the definition of duplicate file is first line and last line of file is matches to second file first and last line then its called as duplicate.&lt;/P&gt;

&lt;P&gt;I can able to achieve the duplicate files if will matches only first line or last line as below&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=main sourcetype=sampledata Header* | eventstats count by _raw | where count&amp;gt;1 | table source, _raw
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In this query its giving me the result of files which has the same header. and in below query am getting the result of files where "Trailer" 0r last line of the file is common &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=main sourcetype=sampledata TRL* | eventstats count by _raw | where count&amp;gt;1 | table source, _raw 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So here we need to compare two query and pick the result where we have header and trailer common. &lt;/P&gt;

&lt;P&gt;Can any one please help me on this.&lt;/P&gt;

&lt;P&gt;Thanks in Adavance&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 08:34:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210026#M61421</guid>
      <dc:creator>snehalk</dc:creator>
      <dc:date>2016-09-22T08:34:35Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210027#M61422</link>
      <description>&lt;P&gt;Please check this one - &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=main sourcetype=sampledata Header* | eventstats count by _raw | where count&amp;gt;1 | table source, _raw
| append [index=main sourcetype=sampledata TRL* | eventstats count by _raw | where count&amp;gt;1 | table source, _raw]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Thanks and regards,&lt;BR /&gt;
Sekar&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 09:10:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210027#M61422</guid>
      <dc:creator>inventsekar</dc:creator>
      <dc:date>2016-09-22T09:10:54Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210028#M61423</link>
      <description>&lt;P&gt;Hello Sekar,&lt;/P&gt;

&lt;P&gt;Thanks for response, i have updated my queries, please check and let me know on this.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 09:25:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210028#M61423</guid>
      <dc:creator>snehalk</dc:creator>
      <dc:date>2016-09-22T09:25:44Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210029#M61424</link>
      <description>&lt;P&gt;Try this:&lt;/P&gt;

&lt;P&gt;index=main sourcetype=sampledata "TRL*" OR "Header*"&lt;BR /&gt;
| EVAL mytype=CASE(MATCH(_raw,"TRL"), "TRAILER",MATCH( _raw,"Header"), "HEADER")&lt;BR /&gt;
| chart count by source, mytype&lt;BR /&gt;
| SEARCH TRAILER&amp;gt;1 AND HEADER&amp;gt;1&lt;/P&gt;

&lt;P&gt;It should show you every source with duplicated "TRL" and "Header"&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 11:07:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210029#M61424</guid>
      <dc:creator>haley_swarnapat</dc:creator>
      <dc:date>2020-09-29T11:07:25Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210030#M61425</link>
      <description>&lt;P&gt;Hello Sekar,&lt;/P&gt;

&lt;P&gt;Here its resulting header matched files, ( eg: its has file which has header same but different trailer)&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 10:36:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210030#M61425</guid>
      <dc:creator>snehalk</dc:creator>
      <dc:date>2016-09-22T10:36:16Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210031#M61426</link>
      <description>&lt;P&gt;Hello Haley,&lt;/P&gt;

&lt;P&gt;Its not displaying any result.. but i remove the eval command then events are coming &lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 10:41:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210031#M61426</guid>
      <dc:creator>snehalk</dc:creator>
      <dc:date>2016-09-22T10:41:30Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210032#M61427</link>
      <description>&lt;P&gt;Oh, it seems that you need to convert your _raw to upper case like this:&lt;/P&gt;

&lt;P&gt;index=main sourcetype=sampledata "TRL*" OR "Header*"&lt;BR /&gt;
| EVAL mytype=CASE(MATCH(upper(_raw),"TRL"), "TRAILER",MATCH( upper(_raw),"HEADER"), "HEADER")&lt;BR /&gt;
| stats count by source, mytype, _raw | WHERE count&amp;gt;1&lt;BR /&gt;
| CHART first(_raw) by source, mytype&lt;/P&gt;

&lt;P&gt;You should be able to see the duplicated _raw in header and trailer for each source&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 11:07:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210032#M61427</guid>
      <dc:creator>haley_swarnapat</dc:creator>
      <dc:date>2020-09-29T11:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210033#M61428</link>
      <description>&lt;P&gt;Like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=main sourcetype=sampledata Header* OR TRL*
| eval MyType=if(searchmatch(Header*), "HDR", "TRL")
| stats first(HDR) AS HDR first(TRL) AS TRL BY source
| stats values(source) count BY HDR TRL
| search count &amp;gt; 1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 22 Sep 2016 13:45:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210033#M61428</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2016-09-22T13:45:12Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210034#M61429</link>
      <description>&lt;P&gt;This will do it more efficiently, and will even work if there ever are more than one match for Header* or TRL* in a given file (fairly easy to imagine that this could happen sometimes).&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=main sourcetype=sampledata (Header* OR TRL*) | stats earliest(_raw) as first latest(_raw) as last by source | stats dc(source) as fileCount values(source) as files by first last | sort - fileCount | where fileCount&amp;gt;1
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Breaking it down - we get the events off disk, and we try and only get &lt;CODE&gt;Header* OR TRL*&lt;/CODE&gt; to avoid getting intermediate events that are of no use to us.  The parens are unnecessary here, but I often like them for clarity. &lt;/P&gt;

&lt;P&gt;The stats command will take the earliest line and the latest line for each source.  NOTE - if events near the start and/or end end up with the same _time value, you'll have a problem here, and we'll need to some additional matching on the Header* and TRL*. Here I've assumed all events near start and end get a different timestamp and there's no ambiguity. &lt;/P&gt;

&lt;P&gt;The next stats command now just counts up the number of source (files) it's seen per row &lt;CODE&gt;dc(source) as fileCount&lt;/CODE&gt;,  the actual values of the paths &lt;CODE&gt;values(source) as files&lt;/CODE&gt;, and it does it for every unique combination of first and last &lt;CODE&gt;by first last&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;the rest is just sorting and filtering to the ones that are actually duplicates. &lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 15:32:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210034#M61429</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2016-09-22T15:32:39Z</dc:date>
    </item>
    <item>
      <title>Re: find the duplicate files from particular source in splunk search query</title>
      <link>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210035#M61430</link>
      <description>&lt;P&gt;Thank you, its working for me.  and good explanation as well. once again thank you so much!!&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2016 14:42:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/find-the-duplicate-files-from-particular-source-in-splunk-search/m-p/210035#M61430</guid>
      <dc:creator>snehalk</dc:creator>
      <dc:date>2016-09-23T14:42:01Z</dc:date>
    </item>
  </channel>
</rss>

