Splunk Search

How to import files that change names when they roll over to a new daily file

tonyparreiro
Explorer

Hi,

I have a system which logs data into a file, once about 24 hours of logging occurs the file is renamed and a new file starts for the next lot of events.

Right now I'm only indexing the files after they are completed which is ok, except you can't do any queries on the most recent data (worst case 24hours).

Is there away/scheme/method to index the data from the file being written to so it can be queried and once it's renamed not to be indexed again?

Thanks,
Tony

0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi tonyparreiro,
you can index log using a jolly character

[monitor:///apps/logs/mylogs*.log]
sourcetype=my_sourcetype
index=my_index

or monitor the file before it's renamed.
In this second way, you continuously have logs and you can monitor them in real time.
if you don't use the option crcSalt=<SOURCE>, when your file is renamed it will not be reindexed.

Bye.
Giuseppe

View solution in original post

somesoni2
Revered Legend

Splunk already does what you seek. You should be monitor the file that is being continuously written into so that you get near real-time events in Splunk (there will be small lag till the data is indexed and made searchable). If your monitoring stanza specifically includes the live file, Splunk will monitoring as and when new data is written into it and will not monitor it once it's renamed/rolled over. (it keeps a handler to know which file it has already read). So just have your monitoring stanza pointing to your live file, something like this

[monitor:///your/log/folder/path/logfile_*_LIVE_*.log]
index=yourindex
sourcetype=yoursourcetype
...
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi tonyparreiro,
you can index log using a jolly character

[monitor:///apps/logs/mylogs*.log]
sourcetype=my_sourcetype
index=my_index

or monitor the file before it's renamed.
In this second way, you continuously have logs and you can monitor them in real time.
if you don't use the option crcSalt=<SOURCE>, when your file is renamed it will not be reindexed.

Bye.
Giuseppe

gcusello
SplunkTrust
SplunkTrust

Ad I described, if you don't use crcSalt option Splunk doesn't read files after rotation .
Bye
Giuseppe

0 Karma

tonyparreiro
Explorer

Monitoring the file before it's renamed will index all the data but once it's renamed it will index the new file will it not?

So there will be duplicate entries in the index. Or is it smart enough to know the records are the same and wont index them again?

I guess will just have to try something and see what happens.

thanks,
Tony

0 Karma

tonyparreiro
Explorer

Thanks for the response gokadroid.

The problem is not selecting the file to index, the problem I'm trying to avoid is that if I index the file as it's being created when it's finished and renamed it will be indexed again and I'll end up with duplicate entries.

For instance, say we had the following set of files. I'm currently indexing everything except file 3. File 3 is the file that is currently being written to by the system.

file 3: >logfile_20170103100502_LIVE.log
file 2: >logfile_20170102100503_20170103100502
.log
file 1: >logfile_20170101100501_20170102100503_.log
etc....

Once file 3 rolls over at the end of the period a new file is created and we end up with something like the following:

file 4: >logfile_20170104100501_LIVE.log
file 3: >logfile_20170103100502_20170104100501
.log
file 2: >logfile_20170102100503_20170103100502_.log
file 1: >logfile_20170101100501_20170102100503_.log
etc....

At this point file 3 is indexed. I would like to index file 4 in this case as I would like to report on information in "real time" if I can. as well as not double index any of the data. Also another problem would be that once the data is indexed the index file it belonged to will no longer exist so not sure what it would do then either.

Hope this clears up the problem.

Thanks.
Tony

0 Karma

gokadroid
Motivator

If the name changes and the extension stays the same a good option might be to monitor the files having the same extension within a directory as follows, example, monitor all files *.log in directory /jfplogs/prod/CP/:

inputs.conf should have stanza as follows (or something silmilar)

[monitor:///jfplogs/prod/CP/*.log]
sourcetype=yourSourcetype
index=yourIndex

See below for other options to specify the file names as per your need:
http://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Specifyinputpathswithwildcards#Input_examples

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...