Getting Data In
Highlighted

Splunk not indexing modified files

I have a FTP data collector which pulls in files from an FTP server and dumps them into a directory monitored by Splunk.

The files are all of the IDA00*.dat files and are sourced from ftp://ftp2.bom.gov.au/anon/gen/fwo/

My script checks this ftp server about every 6 hours and if the modified date has changed on the files it will redownload them and replace them in /home/phoenix/data/bom/

Splunk is setup to monitor this directory with the following conf files

inputs.conf

[monitor:///home/phoenix/data/bom]
disabled = 0
followTail = 0
host = BOM
index = bom
crcSalt = <SOURCE>

props.conf

[source::...[/\\]bom[/\\]IDA00001.dat]
KV_MODE = none
SHOULD_LINEMERGE = false
sourcetype = bomIDA00001
REPORT-extractIDA00001 = IDA00001_Fields
priority = 100

priority 100 required as Splunk ignores .dat files by default. I have had to remove .dat from /opt/splunk/etc/default/props.conf as well recently as the priority stopped working for some reason and the data was being treated as binary (but thats for another topic)

transforms.conf

[IDA00001_Fields]
DELIMS = "#"
FIELDS = loc_id,location,state,forecast_date,issue_date,issue_time,min_0,max_0,min_1,max_1,min_2,max_2,min_3,max_3,min_4,max_4,min_5,max_5,min_6,max_6,min_7,max_7,forecast_0,forecast_1,forecast_2,forecast_3,forecast_4,forecast_5,forecast_6,forecast_7,dummy

Now this seemed to be working ok for a while but for some reason it has stopped indexing files even though new files are coming in with completely different data (in particular the forecast_date). I have can only see data in the index=bom from the 28th of Sept and back. It is the 29th and there should be data in Splunk for that.

Running the following returns some actions on the files in question

grep IDA00001.dat /opt/splunk/var/log/splunk/splunkd.log

09-29-2011 13:52:24.489 +1000 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO  WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO  WatchedFile - Will begin reading at offset=0 for file='/home/phoenix/data/bom/IDA00001.dat'.

So it seems like Splunk is working on the files. Are they being indexed though as the data is not showing up?

Any help would be appreciated.

0 Karma
Highlighted

Re: Splunk not indexing modified files

Champion

Is it possible the timestamping has changed? Just thinking it might be indexing the data but its been put with a different date/time to that which you are expecting

0 Karma
Highlighted

Re: Splunk not indexing modified files

Unfortunately no after clearing monitored directory then clearing the indexes with the command

/opt/splunk/bin/splunk stop; /opt/splunk/bin/splunk clean eventdata -f -index bom; /opt/splunk/bin/splunk clean eventdata -f -index bom_summary; /opt/splunk/bin/splunk start

I retrieve the files again and Splunk shows zero events in the index.

0 Karma
Highlighted

Re: Splunk not indexing modified files

New Member

Hi, I am facing similar problem. Any resolution??

0 Karma
Highlighted

Re: Splunk not indexing modified files

Unfortunately no. We have since moved on from this for now. If you do find a result please let us know here.

0 Karma
Highlighted

Re: Splunk not indexing modified files

New Member

Hi, I am facing similar problem. Any resolution??

0 Karma
Highlighted

Re: Splunk not indexing modified files

Path Finder

i think you have to clean the _fishbuket index on the forwarder, that's the location were splunk stores the information which file is indexed or not

0 Karma
Highlighted

Re: Splunk not indexing modified files

Something I just remembered about this issue.

The file had the extension .dat and this is classified as a binary file by one of the splunk configuration files.

We ended up removing it from /etc/system/default/props.conf under the stanza

[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

Obviously the correct way to do this would be to add this to your props.conf in your app which should override this default.

View solution in original post

0 Karma