I want to get consolidated report based on number of files received at a particular location. We do not want to monitor the files to index its content. We just need the number of files/names and size of the files.
How can we achieve this?
I suggest a scripted input. For Linux, your script could be this simple:
#!/bin/sh date du -md 0 /pathtothedirectory ls -1 /pathtothedirectory | wc -l
On a Windows box, you can do much the same thing in a .bat file. Put the script into the
bin subdirectory of the appropriate app. Here is more info about setting up scripted inputs. For this example, let's say that the sourcetype in
inputs.conf was set to "directoryInfo"
You will probably also need a
props.conf file to extract the fields from your data. Here is a sample:
[directoryInfo] SHOULD_LINEMERGE = true BREAK_ONLY_BEFORE_DATE = true EXTRACT-fe1 = ^\s*(?P<sizeMB>\d+)\s+(?P<directory>\S+)\s*(?P<fileCount>\d+)$
Once you have data coming into to the environment, you can easily see the latest info:
sourcetype=directoryInfo | head 1 | table directory sizeMB fileCount
Or even calculate the averages over time
sourcetype=directoryInfo | stats avg(sizeMb) avg(fileCount)
Thanks for the reply. We will get new files every day, so once the script runs will it also index the old file information again?
How does Splunk work interms of writing duplicate data.
So our requirement is to write unique data.
Thanks a ton for looking into this.
What this will give you is a timestamped event that looks like:
Wed Aug 19 17:33:19 PDT 2015 224 /pathtothedirectory 32
Where 224 is the size in MB and 32 is the number of files. You will get a separate event in Splunk every time the script runs. Each time the script runs, it will count all the files and their size. If you want to do something based on individual files or on the modification/creation time of the files, change the script to generate the data that you want.
If you want to calculate the delta between two runs of the script, you can do that in Splunk.
I suggest that you set up something, and send the script output to a test index. Then you can collect data for a day or two and play with it. When you get it working the way you want, just change the index setting in inputs.conf to send the info into some production index. (And delete the test index.)
Personally, I would set up the whole thing on a test box or my laptop before putting it in a production Splunk environment...