Getting Data In

Filename was different, therefore source is not indexed. Why?

Splunk Employee
Splunk Employee

I'm monitoring a folder but I'm not seeing all the files getting indexed into Splunk.

Then I did

index=_internal sourcetype="splunkd" log_level="ERROR"

and found several events indicating the reason files were not indexed.

04-26-2010 11:58:04.265 ERROR TailingProcessor - Ignoring path due to: File will not be read, is too small to match seekptr checksum (file=C:\Program Files\WebSphere\profiles\AppSrv01\config\cells\sfeserv36Node01Cell\PolicySets\WSReliableMessaging persistent\PolicyTypes\WSReliableMessaging\policy.xml).  Last time we saw this initcrc, filename was different.  You may wish to use a CRC salt on this source.  Consult the documentation or contact Splunk Support for more info.

I do not understand why Splunk is telling me that the filename was different.

Help?

Tags (2)
1 Solution

Splunk Employee
Splunk Employee

Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.

Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:

crcSalt = <SOURCE>

If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.

For reference:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories

View solution in original post

Splunk Employee
Splunk Employee

Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.

Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:

crcSalt = <SOURCE>

If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.

For reference:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories

View solution in original post

Communicator

Is there a way to delete the CRCs of the previous indexing activity? I deleted the index and the data input and basically tried to start over but my files won't index again.

0 Karma

SplunkTrust
SplunkTrust

You could either empty the fish bucket or add a random crcSalt in your inputs.conf.
Adding a salt will change the hash of the files and thus index them again.

Skalli

0 Karma

Super Champion

Just to be completely clear about this setting.... Nicholas, you received this message on an XML config file which is where adding the crcSalt setting is helpful. But you should probably not add this to monitors that are indexing traditional log files. The danger of adding "crcSalt = <SOURCE>" everywhere is that it would re-index a log file after it is rotated, so you could end up with the same events loaded many many times.

Communicator

You can check the duplicated events along with their time of indexing with the below query:

index=your index sourcetype=your sourcetype | eval dup=raw | convert ctime(time) as T1 | convert ctime(_indextime) as indextime | transaction dup mvlist=t maxspan=1s keepevicted=true | table dup,source,sourcetype,host,index,indextime

Process to delete the duplicated events:

  1. Run the following command to store all duplicate events in a lookup table.

index=* sourcetype=wsaaccesslogs | eval id=cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search

eventcount>1
| eval deleteid=mvindex(id, 1, -1) | stats c by deleteid | outputlookup delete_these.csv

  1. Once search finishes completely by running the following command you can view the events stored in lookup table | inputlookup delete_these.csv

Note: You need to wait till your search gets complete. You can use smart mode as well.
You can also check the newly created lookup table in the $SplunkHome\etc\apps\appname\lookups\ delete_these.csv

  1. Run the following command to delete all events from source type which also exists into lookup table (in your case its delete_these.csv)

index=* sourcetype=wsaaccesslogs | eval deleteid=cd."|".index."|".splunkserver | search [|inputlookup deletethese.csv | fields deleteid |

format "(" "(" "OR" ")" "OR" ")"] | delete

Happy Splunking

0 Karma

Splunk Employee
Splunk Employee

Thank you Simeon and Wolverine! It works now with crcSalt =

0 Karma