Reporting

How to create a report based on number of files received at a particular location, but not index the content?

athorat3
New Member

Hi

I want to get consolidated report based on number of files received at a particular location. We do not want to monitor the files to index its content. We just need the number of files/names and size of the files.

How can we achieve this?

Tags (2)
0 Karma

lguinn2
Legend

I suggest a scripted input. For Linux, your script could be this simple:

#!/bin/sh
date
du -md 0 /pathtothedirectory
ls -1 /pathtothedirectory | wc -l

On a Windows box, you can do much the same thing in a .bat file. Put the script into the bin subdirectory of the appropriate app. Here is more info about setting up scripted inputs. For this example, let's say that the sourcetype in inputs.conf was set to "directoryInfo"

You will probably also need a props.conf file to extract the fields from your data. Here is a sample:

[directoryInfo]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true
EXTRACT-fe1 = ^\s*(?P<sizeMB>\d+)\s+(?P<directory>\S+)\s*(?P<fileCount>\d+)$

Once you have data coming into to the environment, you can easily see the latest info:

sourcetype=directoryInfo | head 1 | table directory sizeMB fileCount

Or even calculate the averages over time

sourcetype=directoryInfo | stats avg(sizeMb) avg(fileCount)
0 Karma

athorat3
New Member

Thanks for the reply. We will get new files every day, so once the script runs will it also index the old file information again?
How does Splunk work interms of writing duplicate data.
So our requirement is to write unique data.

Thanks a ton for looking into this.

0 Karma

lguinn2
Legend

What this will give you is a timestamped event that looks like:

Wed Aug 19 17:33:19 PDT 2015
224 /pathtothedirectory
      32

Where 224 is the size in MB and 32 is the number of files. You will get a separate event in Splunk every time the script runs. Each time the script runs, it will count all the files and their size. If you want to do something based on individual files or on the modification/creation time of the files, change the script to generate the data that you want.

If you want to calculate the delta between two runs of the script, you can do that in Splunk.

I suggest that you set up something, and send the script output to a test index. Then you can collect data for a day or two and play with it. When you get it working the way you want, just change the index setting in inputs.conf to send the info into some production index. (And delete the test index.)

Personally, I would set up the whole thing on a test box or my laptop before putting it in a production Splunk environment...

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...