I have a field in one of my datasets labelled user
. We perform automatic lookups globally based on the field user to return a variety of information pertaining to the user identified. Recently I noticed that when searching this particular index in anything other than Fast Mode
the results would take an extremely long time to return. Upon further investigation I believe the cause of this is a combination of the automatic lookup and the fact that some of the user fields in the data set have the value *****
.
The device that we're receiving logs from is masking the user field value with asterisks. When the Splunk search returns results it appears to be attempting to lookup *****
based on the automatic lookup and it is severely effecting the performance of the search. It's as if Splunk is interpreting the asterisks as wild cards and iterating over the entire lookup file (which is quite large).
For example. A five second period of time where none of the events include user
= *****
return in 2.648 seconds when searching in Smart Mode
and allowing the automatic lookup. A similar five second period of time that includes a single user
= *****
field/value pair takes 8.549 seconds. Increase the search time frame and the performance difference becomes much greater.
A 1 hour search with Fast Mode
and no field extractions or lookups: "This search has completed and has returned 3,162 results by scanning 3,162 events in 2.822 seconds"
The same 1 hour search with Smart Mode
utilizing automatic lookups: "This search has completed and has returned 3,162 results by scanning 3,162 events in 256.991 seconds"
For the second test search with Smart mode
enabled and automatic lookups the job inspector shows the duration of command.search.lookups as 1,413.19 seconds.
Interestingly enough all of the events with user
= *****
are all given the same lookup value for user
even though the value was originally all asterisks. This make me think that the automatic lookup is interpreting the asterisks as wild cards and defaulting to some seemingly random value from the lookup table. It also appears that it's iterating over the entire lookup table when encountering these asterisk filled fields.
Has anyone else seen something like this? Should Splunk be interpreting fields with asterisks in them as wild cards?
I submitted a ticket with Splunk support and based on their preliminary examination of this issue they believe it may be a bug.
Interesting. If you post the actual search, then we can optimize the solution.
It sounds like your lookup is returning various values which are then used to initiate a further search. You will need to clear those values with code like woodcock has posted... but without the search itself, we can't be sure whether it needs to be recoded like either of the following
| eval user=if(like(user,"*****"), "#####", user)
or
| eval user=if(like(user,"*****"), "*", user)
From what you have described, the first should result in no returned events, and the second (potentially) with the proper subset of the data, but that conclusion is highly speculative on my part.
This is a good question and I would open a ticket with Splunk support. In the meantime, you should be able to bypass the problem by adding this code to your search:
| eval user=if(like(user,"*****"), "#####", user)