<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splunk as CMDB - initCrcLength set to max creates duplicates in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257968#M49570</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;
i want to use splunk as GUI for a CMDB. I know, that not the default use case, but splunk exists already and i like the possibilities for visualization.&lt;/P&gt;

&lt;P&gt;I'm indexing textfiles with meta-data of hosts as content, the filename contains timestamp &amp;amp; hostname (this is already working). The point is, that those text-files are created every day, and most of the time, there is no change between the days (no software/hw change, etc) - so there is no need to index them again.&lt;/P&gt;

&lt;P&gt;As written in the docs, splunk looks for the first 256 bytes (initCrcLength) to check if the file is already indexed to handle logrotation. Since my case is not a normal logfile, the important change in my files can occur also at the end of the textfile. To ensure that i won't miss a change at the end of the file, i increased initCrcLength to it's maximum of 1048576 (Bytes). &lt;BR /&gt;
My files are smaller than 1048576 Bytes, so from my point of view, splunk should not index files with the same content (checked from beginning to the end of the file).&lt;/P&gt;

&lt;P&gt;My Problem with this configuration is, that exactly this is happening (duplicates are going to be indexed), my inputs.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://D:\CMDB\lokal\test\*.txt]
host_regex = test\\(.*?)_
initCrcLength = 1048576
disabled = false
index = cmdb
sourcetype = cmdb
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Any ideas?&lt;BR /&gt;
BR, Lukas&lt;/P&gt;</description>
    <pubDate>Fri, 29 Jan 2016 10:33:23 GMT</pubDate>
    <dc:creator>f1dot4</dc:creator>
    <dc:date>2016-01-29T10:33:23Z</dc:date>
    <item>
      <title>Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257968#M49570</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;
i want to use splunk as GUI for a CMDB. I know, that not the default use case, but splunk exists already and i like the possibilities for visualization.&lt;/P&gt;

&lt;P&gt;I'm indexing textfiles with meta-data of hosts as content, the filename contains timestamp &amp;amp; hostname (this is already working). The point is, that those text-files are created every day, and most of the time, there is no change between the days (no software/hw change, etc) - so there is no need to index them again.&lt;/P&gt;

&lt;P&gt;As written in the docs, splunk looks for the first 256 bytes (initCrcLength) to check if the file is already indexed to handle logrotation. Since my case is not a normal logfile, the important change in my files can occur also at the end of the textfile. To ensure that i won't miss a change at the end of the file, i increased initCrcLength to it's maximum of 1048576 (Bytes). &lt;BR /&gt;
My files are smaller than 1048576 Bytes, so from my point of view, splunk should not index files with the same content (checked from beginning to the end of the file).&lt;/P&gt;

&lt;P&gt;My Problem with this configuration is, that exactly this is happening (duplicates are going to be indexed), my inputs.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://D:\CMDB\lokal\test\*.txt]
host_regex = test\\(.*?)_
initCrcLength = 1048576
disabled = false
index = cmdb
sourcetype = cmdb
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Any ideas?&lt;BR /&gt;
BR, Lukas&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 10:33:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257968#M49570</guid>
      <dc:creator>f1dot4</dc:creator>
      <dc:date>2016-01-29T10:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257969#M49571</link>
      <description>&lt;P&gt;What error are you getting that shows the files are being reindexed?&lt;/P&gt;

&lt;P&gt;If you search for index_internal  you should see why the reindexing is occuring. &lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 11:56:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257969#M49571</guid>
      <dc:creator>jplumsdaine22</dc:creator>
      <dc:date>2016-01-29T11:56:06Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257970#M49572</link>
      <description>&lt;P&gt;There is no error with reindexing - but when i have 2 files with different filenames and exact the same content, they're both indexed. Since most of the files are identical - the index is filling up with lots of duplicate entries -  this is what i'm trying to avoid with the initCrcLength setting.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 12:07:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257970#M49572</guid>
      <dc:creator>f1dot4</dc:creator>
      <dc:date>2016-01-29T12:07:22Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257971#M49573</link>
      <description>&lt;P&gt;if you run &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;$ head -c   1048576 &amp;lt;filename&amp;gt;.txt | md5sum 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;against all those files, do you get the same hash? &lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 15:21:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257971#M49573</guid>
      <dc:creator>jplumsdaine22</dc:creator>
      <dc:date>2016-01-29T15:21:29Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257972#M49574</link>
      <description>&lt;P&gt;Let's try this differently.   Leave &lt;CODE&gt;initCrcLength&lt;/CODE&gt; alone and set in props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::D:\CMDB\lokal\test\*.txt]
CHECK_METHOD=entire_md5
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 29 Jan 2016 15:35:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257972#M49574</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2016-01-29T15:35:34Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257973#M49575</link>
      <description>&lt;P&gt;You might check if they're the same. I think something similar to this python to this is happening under the hood:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;import glob, os, hashlib
os.chdir("D:/CMDB/lokal/test/")    

for file in glob.glob("*.txt"):
 currentfile = open(file, 'rb')
 hash = hashlib.md5()
 hash.update(currentfile.read(1048576))
 print currentfile, hash.hexdigest()
 currentfile.close()
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 29 Jan 2016 15:36:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257973#M49575</guid>
      <dc:creator>jplumsdaine22</dc:creator>
      <dc:date>2016-01-29T15:36:19Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257974#M49576</link>
      <description>&lt;P&gt;Looks good to me - although I was unclear whether Splunk considers the hostname when comparing crcs&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2016 15:43:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257974#M49576</guid>
      <dc:creator>jplumsdaine22</dc:creator>
      <dc:date>2016-01-29T15:43:53Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk as CMDB - initCrcLength set to max creates duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257975#M49577</link>
      <description>&lt;P&gt;Hi, in theory this sounds good. i removed the initCrcLength param from inputs.conf and added CHECK_METHOD = entire_md5 to props.conf. WIth btool, i checked that this config is active.&lt;/P&gt;

&lt;P&gt;When i copy the txt file, the md5sum is equal, but it is going to be indexed again.&lt;/P&gt;

&lt;P&gt;props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::D:\CMDB\lokal\test\*.txt]
CHECK_METHOD = entire_md5
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;inputs.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; [monitor://D:\CMDB\lokal\test\*.txt]
 host_regex = test\\(.*?)_
 disabled = false
 index = cmdb
 sourcetype = cmdb
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;md5 check:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;md5sum "AUD-S-K001-01__10.txt"
768b4568fa45d8d6771d5ca8160dc483 *AUD-S-K001-01__10.txt

md5sum "AUD-S-K001-01__11.txt"
768b4568fa45d8d6771d5ca8160dc483 *AUD-S-K001-01__11.txt
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 29 Sep 2020 08:35:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-as-CMDB-initCrcLength-set-to-max-creates-duplicates/m-p/257975#M49577</guid>
      <dc:creator>f1dot4</dc:creator>
      <dc:date>2020-09-29T08:35:05Z</dc:date>
    </item>
  </channel>
</rss>

