Getting Data In

Monitoring FTP gz files

alextsui
Path Finder

Hi,
I am planning a Splunk deployment that involves indexing large number of gz files FTP from multiple sources.
Can I configure Splunk to monitor the directories containing these files directly?
My concern is that the directory I am thinking to have Splunk monitored is the FTP upload folder, and because the files are in the gz format. Will Splunk be confused when it sees a gz file still in the middle of transfer? Can I monitor the upload folder directly with Splunk? The plan is to use universal forwarder for the monitoring.

Thanks

Tags (3)
0 Karma
1 Solution

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

View solution in original post

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

Takajian
Builder

Splunk uncompress gz file before indexing it, then splunk index the uncompressed text file. You can monitor the FTP upload folder directly and Splunk index the uploaded file. But I do not recommend it. If ftp transferring have netowrk connectivity issue, ftp client try to resend the file. However splunk can not distingush the file is new one or already index one. Since splunk recognize the file with hash algorithm, splunk does not understand the gz file is new one or not. It means there is possibiliy for splunk to index duplicated events. I faced this issue before. So, I do not recommend.

0 Karma

alextsui
Path Finder

Thanks for your reply. How does the gz file applied in your answer? I mean if the files were regular text files, could I monitor the FTP upload folder directly provided that there would be no special file extension to distinguish from completed files and files in transit.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...