Getting Data In

Indexing same log file multiple times issue

gyarici
Path Finder

Hi All,

I have a question regarding indexing log file. I am using one application and monitoring events online.I have no issue with this but after midnight, this file is zipped and renamed inside the same directory by the application.

Example:
Original file name is : a.log and become: a-070515.log.zip

Therefore Splunk indexing the same file(s) double time. (a.log and a-070515.log.zip)

  1. I need to monitor event online
  2. If I ignore indexing a.log only, I cannot follow the events online.
  3. If I ignore indexing a-xxxx.log , there is risk to be indexed on time (?) -network connection issue etc..

How can I figure out this easily?

Thanks

Gokhan

0 Karma
1 Solution

woodcock
Esteemed Legend

Modify your inputs.conf to include a whiltelist/blacklist:

whitelist=\.log$
blacklist=\.gz$

I believe that the .log files are only being processed once and that the second copy is from the .gz file.

View solution in original post

0 Karma

gyarici
Path Finder

Thanks for the answer. I have some doubts.

If all a.log files are not indexed before midnight due to network issue or other possible scenarios (these files are zipped after midnight), Splunk forwarder somehow caches this data to send it to the Splunk server later OR I lost the data to be indexed due to name(a.log.zip) as blacklist.

0 Karma

woodcock
Esteemed Legend

Research "crcSalt=<SOURCE>". By default, Splunk does not use the name of the file to see if it has forwarded this file or not; it uses a CRC checksum of the first and last bytes of the file. So by default (i.e. unless you TOLD Splunk NOT TO), Splunk will ignore a file that is renamed and not reforward it. However, if you compress it and you have not told splunk NOT to forward the compressed files, it will happily forward the compressed file's contents.

0 Karma

gyarici
Path Finder

Thanks for the advice. I did it with using whitelist/blacklist. I also revised the wildcards according to link below.
http://docs.splunk.com/Documentation/Splunk/6.2.3/Data/Specifyinputpathswithwildcards
Now everything is ok.

0 Karma

gyarici
Path Finder

whitelist = a.log$
blacklist = .*zip$

0 Karma

woodcock
Esteemed Legend

Modify your inputs.conf to include a whiltelist/blacklist:

whitelist=\.log$
blacklist=\.gz$

I believe that the .log files are only being processed once and that the second copy is from the .gz file.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...