How to create a report based on number of files received at a particular location, but not index the content?

New Member


I want to get consolidated report based on number of files received at a particular location. We do not want to monitor the files to index its content. We just need the number of files/names and size of the files.

How can we achieve this?

Tags (2)
0 Karma


I suggest a scripted input. For Linux, your script could be this simple:

du -md 0 /pathtothedirectory
ls -1 /pathtothedirectory | wc -l

On a Windows box, you can do much the same thing in a .bat file. Put the script into the bin subdirectory of the appropriate app. Here is more info about setting up scripted inputs. For this example, let's say that the sourcetype in inputs.conf was set to "directoryInfo"

You will probably also need a props.conf file to extract the fields from your data. Here is a sample:

EXTRACT-fe1 = ^\s*(?P<sizeMB>\d+)\s+(?P<directory>\S+)\s*(?P<fileCount>\d+)$

Once you have data coming into to the environment, you can easily see the latest info:

sourcetype=directoryInfo | head 1 | table directory sizeMB fileCount

Or even calculate the averages over time

sourcetype=directoryInfo | stats avg(sizeMb) avg(fileCount)
0 Karma

New Member

Thanks for the reply. We will get new files every day, so once the script runs will it also index the old file information again?
How does Splunk work interms of writing duplicate data.
So our requirement is to write unique data.

Thanks a ton for looking into this.

0 Karma


What this will give you is a timestamped event that looks like:

Wed Aug 19 17:33:19 PDT 2015
224 /pathtothedirectory

Where 224 is the size in MB and 32 is the number of files. You will get a separate event in Splunk every time the script runs. Each time the script runs, it will count all the files and their size. If you want to do something based on individual files or on the modification/creation time of the files, change the script to generate the data that you want.

If you want to calculate the delta between two runs of the script, you can do that in Splunk.

I suggest that you set up something, and send the script output to a test index. Then you can collect data for a day or two and play with it. When you get it working the way you want, just change the index setting in inputs.conf to send the info into some production index. (And delete the test index.)

Personally, I would set up the whole thing on a test box or my laptop before putting it in a production Splunk environment...

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Updates (ESCU) - New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 3 releases of new content via the Enterprise ...

Thought Leaders are Validating Your Hard Work and Training Rigor

As a Splunk enthusiast and member of the Splunk Community, you are one of thousands who recognize the value of ...

.conf23 Registration is Now Open!

Time to toss the .conf-etti &#x1f389; —  .conf23 registration is open!   Join us in Las Vegas July 17-20 for ...