I'm not entirely certain exactly how the search optimization in Splunk works. Certainly, if I search only for a rare indexed word, then all entries that contain that word will be found quickly. But what if I want to search for a substring of a rare indexed word, which is itself rare. Say for the sake of argument that this rare substring only occurs in one indexed word. I can search for the substring bracketed by asterisks, but that seems to take significantly longer than the search for the rare indexed word that the substring is part of.
Is there an efficient way to do a search like this directly? Failing that, is there a way to list all indexed words that contain a common substring? If I had that list for a given substring, I could simply search for all instances of the indexed words that contain the substring.
Whenever a search term begins with a wildcard, the search will be particularly slow. Using wildcards forces Splunk to serially scan the lexicon to find any matching keywords for each bucket. Search terms that end with a wildcard are not as slow as search terms that begin with a wildcard.
If Splunk knows the exact search term, it can use the index to find it directly. It can also use bloom filters to eliminate many buckets from the search. Bloom filters do not work with wildcards.
There is no way to use a wildcard while avoiding the performance penalty of using a wildcard.
However, if you can narrow the search by including additional terms, that will help. For example, be sure to specify the index and the sourcetype. Also, use as narrow a timerange as possible for your search. Anything that helps Splunk reduce the number of buckets to scan, will be good.
So.... Say I'm searching for the string "bcdefghijklmnop", which occurs exactly once in my entire (large) dataset. The one time this occurs, it does so as " abcdefghijklmnopq " (but, of course, I don't know what the leading & trailing characters are). Are you saying that the only way to find this instance is to search for