Knowledge Management

Best practice for FTP from numerous sources

msarro
Builder

Greetings everyone. I am working to try and aggregate .csv data from a number of sources. Initially its just a few devices but the number will be millions when the project is completed.

For now, I just need to get our test lab working with some essential infrastructure equipment. All of the equipment is configured to regularly export .csv files via FTP. I would like to set up a directory on the test server I have set up to receive these files, and have splunk monitor the directory. I'm pretty sure this is possible, but it leads me to my next question.

If I have numerous different devices all dumping files to the same directory, how does Splunk tell what data came from what device?

0 Karma
2 Solutions

Brian_Osburn
Builder

I'd suggest you set up sub-directories underneath the main directory, one for each system dumping their .csv files there.

This way, when setting up Splunk to digest those .csv files, it can extract the host from the sub-directory name using the "Segment on path" option in the input setup.

You can get more information from here http://www.splunk.com/base/Documentation/latest/Admin/Setadefaulthostforaninput

Brian

View solution in original post

southeringtonp
Motivator

  • If it's manageable, have one directory for each host, and use host_segment in inputs.conf to assign hostnames.
  • If the hostname is anywhere in the file path, you can use host_regex in inputs.conf.
  • If the hostname appears in each event, you can use a transform to assign the host. This is per-event.
  • Take a look at these doc entries also:
    http://www.splunk.com/base/Documentation/4.1.5/Admin/Setadefaulthostforaninput

    http://www.splunk.com/base/Documentation/4.1.5/admin/Overridedefaulthostassignments

    View solution in original post

    southeringtonp
    Motivator

  • If it's manageable, have one directory for each host, and use host_segment in inputs.conf to assign hostnames.
  • If the hostname is anywhere in the file path, you can use host_regex in inputs.conf.
  • If the hostname appears in each event, you can use a transform to assign the host. This is per-event.
  • Take a look at these doc entries also:
    http://www.splunk.com/base/Documentation/4.1.5/Admin/Setadefaulthostforaninput

    http://www.splunk.com/base/Documentation/4.1.5/admin/Overridedefaulthostassignments

    msarro
    Builder

    Thank you for your help, I really appreciate it.

    0 Karma

    Brian_Osburn
    Builder

    I'd suggest you set up sub-directories underneath the main directory, one for each system dumping their .csv files there.

    This way, when setting up Splunk to digest those .csv files, it can extract the host from the sub-directory name using the "Segment on path" option in the input setup.

    You can get more information from here http://www.splunk.com/base/Documentation/latest/Admin/Setadefaulthostforaninput

    Brian

    msarro
    Builder

    Thank you for your help, I really appreciate it. Both of these solutions work, and I'm going to set up a hierarchical structure just to keep things organized. Thanks!

    0 Karma
    Got questions? Get answers!

    Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

    Meet up IRL or virtually!

    Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

    Get Updates on the Splunk Community!

    Observability Simplified: Combining User Experience, Application Performance & ...

    Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

    Event Series May & June: From Network Visibility to Service Intelligence

    Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...

    Global Splunk User Group Events: May + June 2026

    Your Splunk Community Awaits: Discover Upcoming User Group Events Worldwide    Staying ahead in the fast-paced ...