Solved: my monitored file is always skipped because it has...

mataharry · ‎07-20-2011

Hi

I am trying to have splunk monitoring a log file. But splunk indexed it once, and since is skipping it every time.

My log file is recreated every day, then is filled during the day.
The first lines are always the same. -> the crc md5 calculation on the first 256 characters is not enough to differentiate them.
With the same name filename and location. -> using crcSalct= doesn't solve the problem.

my file is /app/logs/superduper.log
the header is like following with more than 256 characters before the actual timestamps and events
################## blah blah blah ################## ################## blah blah blah ################## ################## blah blah blah ################## ################## blah blah blah ################## ....
my inputs is :



[monitor:///app/logs/]

crcSalt=< SOURCE > #remove the spaces they are to display on the webapge

yannK · ‎07-20-2011

Exact, the file will be skipped.
You can use the REST API to check that this is because of the header.
https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus

You have 3 solutions :

use crcSalt, and store the files in a different folder with the date, or even better change the filename.

by example : /app/logs/20110720/superduper.log
or /app/logs/superduper20110722.log

change your application to include the date on the first line of the log example :

# generated :  2011-07-20
################## blah blah blah ##################
....

trick the monitoring process by using a simlink to the file.

Define an input on the folder with crcSalt on the filename,
a blacklist on the exact original file,
and the option followsimlink enabled (it is by default)
it should look like :
[monitor:///apps/logs/] crcSalt= < SOURCE > #remove the spaces they are to display on the webapge blacklist=superduper.log followSymlink = true

Every day, after checking that the original file was replaces, create a new simlink to the real file that contains the date in the filename and will not match the backlist
by example : superduper_20110720.log
also clean the old simlinks.

That way, every day, splunk will detect the new simlink and start the index the file linked. And because of the blacklist, you won't have duplicates.

View solution in original post

gkanapathy · ‎07-20-2011

Yes, this is kind of fail. I would file an Enhancement Request with Splunk to have it either use a > 256 byte CRC, or to be able to specify an offset from the start of file for the CRC.

yannK · ‎07-20-2011

Exact, the file will be skipped.
You can use the REST API to check that this is because of the header.
https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus

You have 3 solutions :

use crcSalt, and store the files in a different folder with the date, or even better change the filename.

by example : /app/logs/20110720/superduper.log
or /app/logs/superduper20110722.log

change your application to include the date on the first line of the log example :

# generated :  2011-07-20
################## blah blah blah ##################
....

trick the monitoring process by using a simlink to the file.

Define an input on the folder with crcSalt on the filename,
a blacklist on the exact original file,
and the option followsimlink enabled (it is by default)
it should look like :
[monitor:///apps/logs/] crcSalt= < SOURCE > #remove the spaces they are to display on the webapge blacklist=superduper.log followSymlink = true

Every day, after checking that the original file was replaces, create a new simlink to the real file that contains the date in the filename and will not match the backlist
by example : superduper_20110720.log
also clean the old simlinks.

That way, every day, splunk will detect the new simlink and start the index the file linked. And because of the blacklist, you won't have duplicates.

Paolo_Prigione · ‎09-13-2011

Nice tips, but you cannot modify IBM WebSphere's SystemOut that easily (case 2), or have cronjobs scan the log directory any minute to build symlinks for files that are rotated by size and not date (case 3).

my monitored file is always skipped because it has the same name, or an identical header.

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability

Are you a member of the Splunk Community?

my monitored file is always skipped because it has the same name, or an identical header.

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability