Getting Data In

Monitoring FTP gz files

alextsui
Path Finder

Hi,
I am planning a Splunk deployment that involves indexing large number of gz files FTP from multiple sources.
Can I configure Splunk to monitor the directories containing these files directly?
My concern is that the directory I am thinking to have Splunk monitored is the FTP upload folder, and because the files are in the gz format. Will Splunk be confused when it sees a gz file still in the middle of transfer? Can I monitor the upload folder directly with Splunk? The plan is to use universal forwarder for the monitoring.

Thanks

Tags (3)
0 Karma
1 Solution

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

View solution in original post

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

Takajian
Builder

Splunk uncompress gz file before indexing it, then splunk index the uncompressed text file. You can monitor the FTP upload folder directly and Splunk index the uploaded file. But I do not recommend it. If ftp transferring have netowrk connectivity issue, ftp client try to resend the file. However splunk can not distingush the file is new one or already index one. Since splunk recognize the file with hash algorithm, splunk does not understand the gz file is new one or not. It means there is possibiliy for splunk to index duplicated events. I faced this issue before. So, I do not recommend.

0 Karma

alextsui
Path Finder

Thanks for your reply. How does the gz file applied in your answer? I mean if the files were regular text files, could I monitor the FTP upload folder directly provided that there would be no special file extension to distinguish from completed files and files in transit.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...