Getting Data In

my monitored file is always skipped because it has the same name, or an identical header.

Communicator

Hi

I am trying to have splunk monitoring a log file. But splunk indexed it once, and since is skipping it every time.

  • My log file is recreated every day, then is filled during the day.
  • The first lines are always the same. -> the crc md5 calculation on the first 256 characters is not enough to differentiate them.
  • With the same name filename and location. -> using crcSalct= doesn't solve the problem.

my file is /app/logs/superduper.log
the header is like following with more than 256 characters before the actual timestamps and events

################## blah blah blah ##################
################## blah blah blah ##################
################## blah blah blah ##################
################## blah blah blah ##################
....

my inputs is :


[monitor:///app/logs/]
crcSalt=< SOURCE > #remove the spaces they are to display on the webapge

1 Solution

Splunk Employee
Splunk Employee

Exact, the file will be skipped.
You can use the REST API to check that this is because of the header.
https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus

You have 3 solutions :

  • use crcSalt, and store the files in a different folder with the date, or even better change the filename.

by example : /app/logs/20110720/superduper.log
or /app/logs/superduper20110722.log

  • change your application to include the date on the first line of the log example :
    # generated :  2011-07-20
    ################## blah blah blah ##################
    ....
    
  • trick the monitoring process by using a simlink to the file.

Define an input on the folder with crcSalt on the filename,
a blacklist on the exact original file,
and the option followsimlink enabled (it is by default)
it should look like :

[monitor:///apps/logs/]
crcSalt= < SOURCE > #remove the spaces they are to display on the webapge
blacklist=superduper.log
followSymlink = true

Every day, after checking that the original file was replaces, create a new simlink to the real file that contains the date in the filename and will not match the backlist
by example : superduper_20110720.log
also clean the old simlinks.

That way, every day, splunk will detect the new simlink and start the index the file linked. And because of the blacklist, you won't have duplicates.

View solution in original post

Splunk Employee
Splunk Employee

Yes, this is kind of fail. I would file an Enhancement Request with Splunk to have it either use a > 256 byte CRC, or to be able to specify an offset from the start of file for the CRC.

Splunk Employee
Splunk Employee

Exact, the file will be skipped.
You can use the REST API to check that this is because of the header.
https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus

You have 3 solutions :

  • use crcSalt, and store the files in a different folder with the date, or even better change the filename.

by example : /app/logs/20110720/superduper.log
or /app/logs/superduper20110722.log

  • change your application to include the date on the first line of the log example :
    # generated :  2011-07-20
    ################## blah blah blah ##################
    ....
    
  • trick the monitoring process by using a simlink to the file.

Define an input on the folder with crcSalt on the filename,
a blacklist on the exact original file,
and the option followsimlink enabled (it is by default)
it should look like :

[monitor:///apps/logs/]
crcSalt= < SOURCE > #remove the spaces they are to display on the webapge
blacklist=superduper.log
followSymlink = true

Every day, after checking that the original file was replaces, create a new simlink to the real file that contains the date in the filename and will not match the backlist
by example : superduper_20110720.log
also clean the old simlinks.

That way, every day, splunk will detect the new simlink and start the index the file linked. And because of the blacklist, you won't have duplicates.

View solution in original post

Nice tips, but you cannot modify IBM WebSphere's SystemOut that easily (case 2), or have cronjobs scan the log directory any minute to build symlinks for files that are rotated by size and not date (case 3).

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!