Im a Splunk newb and i am trying to find the best way to use Splunk to monitor an FTP Home Folder. I do not care about the contents of the file and prefer to not have their contents in Splunk since they contain HIPPA data.
I need to monitor how many files are in the folder and how old they are so i can be alerted if a file is left in there for a long time as well as if the number of files pass a threshold.
I think a scripted input would be useful for this task. Take the output of a command like $ls -l and index it. Extract the output into appropriate fields and Splunk it.
this is exactly what i need to do.
did this suggested answer work?
if so, can someone be more specific on how to go about doing it?
Hopefully this will help. Create a script that does two things:
Create a cron job that executes this script on a given interval.
Your requirements might differ a little bit; maybe you want to examine several directories or all of the child directories within a given directory. You can use your scripting skills to format the output in a way that makes field extraction happen automatically or you can just use your Splunk field extraction skills on the default output.
Create a monitor input to read the file your script is making, make sure Splunk is exctacting your fields, create an alert in Splunk and you're set! BonusSyslog your output to another server if you don't want to install a universal forwarder on the server in question*
Just to finish off what I have ended up doing. Our splunk environment is windows based not linux.
so I created this script which is batch file that runs every 5 mins as a windows scheduled task. This runs on the splunk server. but i use a network path in the batch file so i can count remote directories.
@ECHO OFF SETLOCAL SETLOCAL ENABLEDELAYEDEXPANSION SET count=0 for %%o IN ("\\network\folder\location\*.*") DO ( echo %%o SET /A count=count + 1 ) net time \\%computername% |find "Current time" >> c:\count\countfiles.txt echo dataareaid=UK >> c:\count\countfiles.txt echo currentcount=%count% >> c:\count\countfiles.txt ENDLOCAL ENABLEDELAYEDEXPANSION ENDLOCAL
this outputs the timestamp (net time) and the count result to the txt file (countfiles.txt). i use the double arrow >> to append the results each time it runs.
I then created a new data input > file input in splunk that index's this countfiles.txt file.
Splunk automatically picked up the timestamp and created the correct event rows for me.
because i put "currentcount=" in the batch file, splunk identifies that as a custom field so i can search on it.
When creating the input i created a new sourcetype called "filecount".
I monitor different directories each with a different batch file and resulting counttxt file. I have setup a file input for each of these in splunk and assigned them all this new sourcetype. This way i can search using "sourcetype="filecount" and it returns all my file count results which i plot on a single chart. In our case each directory relates to a different country.
the full search i use is:
sourcetype="filecount" | timechart max(currentcount) by dataareaid span=5m
this gives me exactly what i needed, a running count over time showing the maximum file count.
We have functions that process these files and move them on. If that function fails the files arent moved and the count rises as the files stay in the folders. This is shown perfectly on our splunk chart as a rising count line and alerts us to any issues with this process.
The same could be used for anyone with an ftp server, processing incoming files.
hope that helps anyone wanting to do a similar thing.