I have a small Hunk problem which results in the occasional failed search.
What I want is to have some way of flagging to Hunk that it should hold off from searching for a few minutes.
Basically every few minutes I fetch more data and new small files appear in my HDFS directory tree. It happens to be a tree like /data/TOPIC/2015/06/16/datafile.compressionformat
Once an hour I have a job which looks at each of these directories and takes all the many small files and merges them into one big file per directory. (This is generally a good idea because each file requires a separate Map which slows the MR down. Also many small files take up much more heap space memory in the Namenode than few large files.)
The problem is that if a Hunk search is going on at that time then the MR job is told which files to look at when the job starts, but by the time they get around to processing the files some of them have disappeared - they have been merged into new larger files.
So one possible solution is to get my file merge job to
a) only start when no Hunk jobs are running. and
b) prevent any Hunk jobs from starting until it has finished.
Has anyone tried such a thing?
I am guessing the only way to do this would be to use queues in some way - perhaps making my file merge take up a whole queue?
Any ideas?
Thanks
You can disable searches against an index or virtual index by adding the line "disabled = 1" to its stanza in indexes.conf. This won't cause searches to wait, but it will cause them to return immediately with no results.