I have setup a file/dir import input to look at a folder and injest the contents of the log files into splunk, there are a huge number of existing files (5000+) I'd like to import to analyse for history going back 10 years.
What I have noticed is that there appear to be large gaps in the data over periods of time over the last 10 years. When I query for the source as the files in the missing time period there is no data for that file which shows up, but it's marked in the system logs as being imported. The data in the file looks ok, so not sure why it wasn't imported. The only thing that I could thing was that because I copied a large number of files into the folder at once it may have gotten something confused in the indexing process, but I'm surprised if that was the case.
I have setup another index and I'm now drip feeding the log files into a folder to see if it still has issues with the same time periods as before.
Is there any other info on a best practice to import a large number of existing log files?
Have you verified that the imported logs don't exceed the dimension of your index or the retention period (if you defined it)?
the second check to do is: have you performed your search immediately or after a period? because if you acquired logs using a Forwarder it need time to send all data to the indexer.
If you want to reindex files, remember that you have to use
crcSalt = <SOURCE> in your inputs.conf otherwise Splunk doesn't reindex files also deleting them from an index.
Thanks for the reply.
The index is configured as per the default for Splunk, max size of 500Gb and don't believe there is a limit on retention.
The files are on the indexer, I've done the search many days after the original import and has not changed.
Probably should have tried the crSalt option.
The weird thing is there is data before and after the missing sections, very weird, I initially thought it may have been something to do with the original source files maybe not being formated quite right or something but I looked at ones that worked and ones that didn't and couldn't really find a problem with them.
At the moment I'm dropping one file into the monitored folder every minute and it appears to be picking up everything, just have to wait for it catch finished dropping in another 6 years worth of files to finish.