I have a FTP data collector which pulls in files from an FTP server and dumps them into a directory monitored by Splunk.
The files are all of the IDA00*.dat files and are sourced from ftp://ftp2.bom.gov.au/anon/gen/fwo/
My script checks this ftp server about every 6 hours and if the modified date has changed on the files it will redownload them and replace them in /home/phoenix/data/bom/
Splunk is setup to monitor this directory with the following conf files
inputs.conf
[monitor:///home/phoenix/data/bom]
disabled = 0
followTail = 0
host = BOM
index = bom
crcSalt = <SOURCE>
props.conf
[source::...[/\\]bom[/\\]IDA00001.dat]
KV_MODE = none
SHOULD_LINEMERGE = false
sourcetype = bomIDA00001
REPORT-extractIDA00001 = IDA00001_Fields
priority = 100
priority 100 required as Splunk ignores .dat files by default. I have had to remove .dat from /opt/splunk/etc/default/props.conf as well recently as the priority stopped working for some reason and the data was being treated as binary (but thats for another topic)
transforms.conf
[IDA00001_Fields]
DELIMS = "#"
FIELDS = loc_id,location,state,forecast_date,issue_date,issue_time,min_0,max_0,min_1,max_1,min_2,max_2,min_3,max_3,min_4,max_4,min_5,max_5,min_6,max_6,min_7,max_7,forecast_0,forecast_1,forecast_2,forecast_3,forecast_4,forecast_5,forecast_6,forecast_7,dummy
Now this seemed to be working ok for a while but for some reason it has stopped indexing files even though new files are coming in with completely different data (in particular the forecast_date). I have can only see data in the index=bom from the 28th of Sept and back. It is the 29th and there should be data in Splunk for that.
Running the following returns some actions on the files in question
grep IDA00001.dat /opt/splunk/var/log/splunk/splunkd.log
09-29-2011 13:52:24.489 +1000 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO WatchedFile - Will begin reading at offset=0 for file='/home/phoenix/data/bom/IDA00001.dat'.
So it seems like Splunk is working on the files. Are they being indexed though as the data is not showing up?
Any help would be appreciated.
Something I just remembered about this issue.
The file had the extension .dat and this is classified as a binary file by one of the splunk configuration files.
We ended up removing it from /etc/system/default/props.conf under the stanza
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
Obviously the correct way to do this would be to add this to your props.conf in your app which should override this default.
Something I just remembered about this issue.
The file had the extension .dat and this is classified as a binary file by one of the splunk configuration files.
We ended up removing it from /etc/system/default/props.conf under the stanza
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
Obviously the correct way to do this would be to add this to your props.conf in your app which should override this default.
i think you have to clean the _fishbuket index on the forwarder, that's the location were splunk stores the information which file is indexed or not
Hi, I am facing similar problem. Any resolution??
Hi, I am facing similar problem. Any resolution??
Unfortunately no. We have since moved on from this for now. If you do find a result please let us know here.
Unfortunately no after clearing monitored directory then clearing the indexes with the command
/opt/splunk/bin/splunk stop; /opt/splunk/bin/splunk clean eventdata -f -index bom; /opt/splunk/bin/splunk clean eventdata -f -index bom_summary; /opt/splunk/bin/splunk start
I retrieve the files again and Splunk shows zero events in the index.
Is it possible the timestamping has changed? Just thinking it might be indexing the data but its been put with a different date/time to that which you are expecting