this is a tricky question about the internals of Splunk.
We had an issue with our installation: basically a single splunk instance on a syslog server consuming the logs. Due to some garbage syslog, the syslog daemon spitted out 50'000s of files with just one line, that got indexed as individual sources.
After that, search performance was ridiculously slow.
We decided to delete the index and not reindex the old data, i.e. just start indexing fresh data with the garbage properly filtered out.
Still, search performance was very slow.
Finally, we deleted all indexes (incl. internal ones), and now everything is amazingly fast.
Has anybody an explanation why cleaning just the faulty index was not enough?
Hmm, if you
[monitor] a directory with tens of thousands of files, it will surely tax the tailing processor (who is responsible for checking on whether a file has been updated or not).
When you cleaned the data in your index, did you also clean out all those files from /var/log/syslog (or wherever they are created).
Also, the internal
fishbucket index will keep track of all files it has seen, and it will not be deleted when you remove the source files or clean out the data index. That could also play a part.
The sheer amount of indexed bytes probably has little to do with it.
yes, we did clean also all the files to avoid reindexing. I was also thinking about the fishbucket, but had difficulty to understand how it would be related to a search...
Another possibility was that actually it was still the indexer that had a problem, and blocked some resources from the searches...
Well, it's hard to say now that you've cleaned it. I guess you didn't make a diag dump before you started cleaning?
But I agree with you, a large fishbucket could not really affect searching, could it? Unless it also degrades general performance in some way.
Question, did you have millions of hosts, or millions of sources ?
Are you on an old version (3., 4.1.. 4.2.) ? because they were sensitive to large number of metadata. stored in ($SPLUNK_HOME/var/lib/splunk/
However deleting the data and the global metadata version should have helped
The fishbucket keeps track of the sources not the so syslog TCP inputs
finally, you may have a large number of learned sourceype in your learned app, please check and clean $SPLUNK_HOME/etc/apps/leaned/local/
this is strange, maybe as kristian said, the scope of the files to tail is high and this is the tailing processor that is busy.