Getting Data In

What is the best way to monitoring too many files in Splunk?

jrballesteros05
Communicator

Hello everybody.

I have a problem with monitoring multiple files in a Heavy Forwarder. I mounted a folder with sshfs and I monitoring transactions files but the software always create a file per transaction, and I receive almost 3000 files per day. The HF does not have problem monitoring files from 1 day (3000 files) but when I start receiving files from the second day (6000 files) the HF cannot handle it, I executed a "ls" command and I never get a response from the server, when something like this happens, the customer always rename the folder and the HF start working again so I decided to split those files in different directories per day or maybe per hour but I'm not sure if the HF forwarder can handle it without problems.

Does anyone can tell me, based on their experience, what is the best way to monitoring those files?
Is it a good idea to split the data in different directories? Is Splunk not going to have any problem monitoring different directories?

I read this thread and it helped me to monitoring multiples files but I did not see something about monitoring too many files.

https://answers.splunk.com/answers/220025/what-is-recommended-to-monitor-multiple-files-in-t.html?ut...

Any help will be appreciate.

This is an example of my inputs.conf file:

## ECC Events
monitor:///home/splunk/sshfslogs/XXX.XXX.XXX.XXX/ecc/splunk]
disabled = 0
sourcetype = mysourcetype_ecc
index = my_index
host_segment = 4

## PO Events
[monitor:///home/splunk/sshfslogs/XXX.XXX.XXX.XXX/po/splunk]
disabled = 0
sourcetype = mysourcetype_po
index = my_index
host_segment = 4
0 Karma
1 Solution

starcher
SplunkTrust
SplunkTrust

You should use something like logrotate to archive the files out of the path or if you must leave it in the path have it tgz the files and add a blacklist on the tgz named files in your inputs. If you do not do this you will choke the Splunk HF/UF because it will have to keep checking all files in the path for changes even when you never expect to append to them again.

View solution in original post

starcher
SplunkTrust
SplunkTrust

You should use something like logrotate to archive the files out of the path or if you must leave it in the path have it tgz the files and add a blacklist on the tgz named files in your inputs. If you do not do this you will choke the Splunk HF/UF because it will have to keep checking all files in the path for changes even when you never expect to append to them again.

View solution in original post

jrballesteros05
Communicator

Hello, thank you for your reply.

I only have read access to the files in the FTP so I cannot use tools like logrotate, I recently spoke to the customer and he said he runs a script that move all the files out the path and it keeps only files from one day. For example I have around 4000 files but Splunk it's not reading those files in real time.

I'm monitoring transactions logs and I received around two files per transaction, every file have around 10 lines and I don't expect the files to update.

Of course every file has a structure like this:

BusissnessUnit_Operation_TransactionID_Date_IN
BusissnessUnit_Operation_TransactionID_Date_OUT

This is a file sample (It is in Spanish):

-rw-rw---- 1 51194 1002 312 Feb 13 17:50 acreedor_consultar_C6EC1A3C-F23E-11E6-B6B3-0A42036A0000_20170213175034_SALIDA
-rw-rw---- 1 51194 1002 301 Feb 13 17:50 acreedor_consultar_C6EC1A3C-F23E-11E6-B6B3-0A42036A0000_20170213175034_ENTRADA

I don't know if the structure can help splunk to check files by date.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

If you dont have control of the source files, another method would be to copy those files to a local directory and ingest from there. So once a day, or even N times a day, do a scp / rsync..

Once you have the files locally, you can either use logrotate or since you are copying these locally, use a sinkhole batch input to ingest these... http://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Monitorfilesanddirectorieswithinputs.conf#Bat...

jrballesteros05
Communicator

Hello everybody. I am glad to tell you your answer helped me a lot. I finally made my own script that uses rsync to get only the new files and mix all the files into one log in /var/log/transacciones/, then I use logrotate to rotate the files.

Thank you very much for your help.

0 Karma

jrballesteros05
Communicator

Hello esix. You gave me an hint with the batch input (I didn't remember that kind of splunk input). I think the batch input can help me because the customer have already made a script that clean the SFTP everyday. If I cannot ingest the data properly I will make my own script to copy those files to a local directory and I will use batch input with "sinkhole" option.

I will try and I'll let you know. Thank you very much guys 😄

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!