Splunk Search

How to monitor file like csv file

sieutruc
Contributor

Hello,

I have a csv-liked file as: test.txt

"Equipment","LNKEQP","METAST","METSER","MODSTA","METEOD"
"HLL_POS_00098",1,1,0,0,0
"TOY_GAT_00003",0,0,0,3,0
"NAT_POS_00010",0,3,0,0,0
"NAT_GAT_00002",0,0,0,0,0
"NAT_GAT_00001",0,0,0,0,0
"NAT_POS_00002",1,1,0,0,0

Each time my machine runs, it will delete the old file and generate a new file with the same name test.txt. (maybe the content has no change if the system is stable)

Can you show me how to keep track that up-to-date content ? and when indexing that file, Splunk will eleminate the first head row and use that as fields to those each line event?

Tags (1)

andrewkenth
Communicator

I was under the impression that the same named file would be imported if the file size change if you implement the Monitor technique. This would retain history, as long as each record has a timestamp.

An alternate solution would be to create a file w/ a timestamp on the end (and timestamp for each record), monitor the file using a wildcard (myfile*.csv), setup the file as a csv file and set that file to a particular sourcetype for easy reporting in Splunk.

If you can't put a timestamp on each record I think Splunk uses the imported time as the _time if I recall correctly. If that will work for you that may be an option as well.

1) Generate csv file w/ timestamp on end and each record. Technique will vary based on source system etc. I implemented this command line on the crontab to measure disk usage once per day. This command line will create file w/ a timestamp on the end, add a header lne and add a timestamp to the front of each row.

echo "Timestamp,Filesystem,Used,Available" >> //apps/wcm-splunk/work/crd/log/prod/diskWatcher_"$(date +'%Y%m%d')".log; df -P | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print","$1","$3","$4 }' | gawk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0 }' >> //apps/wcm-splunk/work/crd/log/prod/diskWatcher_"$(date +'%Y%m%d')".log

2) Monitor the file in inputs.conf:

[monitor:///apps/wcm-splunk/work/crd/prod/*myFile*csv]
sourcetype = mySourcetype
disabled = false
index = myIndex

3) Set the file up as csv and specific sourcetype in props.config:

[mySourcetype]
NO_BINARY_CHECK = 1
pulldown_type = 1
HEADER_MODE = firstline
FIELD_DELIMITER=,
FIELD_QUOTE="
TIME_FORMAT=%b %d %Y %H:%M%p
TIMESTAMP_FIELDS=TimeStamp
0 Karma

somesoni2
SplunkTrust
SplunkTrust

I am facing issue (see here) which is exactly opposite to your and my situation may help you.

What you can try is:

 1. convert your CSV like .txt file to .csv 
 2. Add that as lookup table file in your app
 3. Configure Splunk to monitor this lookup table file (from path $SPLUNK_HOME$/etc/apps/<yourApp>/lookups/test.csv)
 4. Have some mechanism (script/code change) to get the latest file delete and recreate this lookup table file (csv).

Splunk will now re-index the whole file even if there are no changes. Worth a try.

0 Karma

mikebd
Path Finder

Options:

  1. Add a timestamp column to the CSV and have it populated on every row
  2. Use a command / script input to capture and index the current contents of the file
0 Karma

Ayn
Legend

Since you don't seem to be interested in historical data for that CSV, the easiest would probably be to use this csv file as a lookup file, since it's already in a format that Splunk can use right away as a lookup.

http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Addfieldsfromexternaldatasources

sieutruc
Contributor

Thanks for your answer, but actually i need to store historical data too. The problem is that if the content is not changed, even i delete that file and create a new file with the same name, the same content cannot be indexed to Splunk.

So is there a method that if the modification date of file is changed, Splunk will index such a file again, no matter whether its content is changed;

Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out &gt;&gt; As our brave ...