I'm running into some really slow performance searching on WMI sources. In this case I'm just trying to get some general counts, I don't even need most of the fields, but the job inspector is telling me that
command.search.kv is using probably some 90% of my overall search time. I took a look in the
search.log file and saw that splunk keep trying to disable all three of the
wel-* extractors but it apparently failed to do so.
So I started digging in. I looked at the default extractors and quickly came to the conclusion that I don't want any of them.
From the default system
[source::WMI...] REPORT-MESSAGE = wel-message, wel-eq-kv, wel-col-kv
From the default system
[wel-message] REGEX = (?sm)(?<_pre_msg>.+)\nMessage=(?<Message>.+) CLEAN_KEYS = false [wel-eq-kv] SOURCE_KEY = _pre_msg DELIMS = "\n","=" MV_ADD = true [wel-col-kv] SOURCE_KEY = Message REGEX = \n([^:\n\r]+):[ \t]++([^\n]*) FORMAT = $1::$2 MV_ADD = true
Here's what I found, the
wel-message extractor has to run first, it extracts two fields which are used by the
wel-col-kv extractors which extract their own set of fields.
wel-message seems to basically just separate everything before "Message=" from everything after it. (Apparently "Message" is a multi-line field.) In my WMI events, I don't seem to have any "Message" entries at all, which I'm assuming is because I'm not using WMI to collect log files, I'm using WinEventLog for that purpose.)
wel-message matches, then
_pre_msg is searched for "key=value" pairs using
wel-eq-kv, and then
Message is searched for "key:value" fields with the
At first glance it seemed odd that I didn't see a
KV_MODE=none props entry, since I figured that with 3 custom transforms stanzas all the fields would be covered, but their actually not. The reason for this is that in
wel-message (the first extractor), the
_pre_msg field is ONLY extracted if the
Message field is found, otherwise the regex doesn't match; therefore the auto key/value extraction is still needed based on the default setup.
So, does any one have any thoughts as to why these three extractors are singly handedly killing my
source=WMI:* performance when I don't have "Message=" in any of my events, and therefore most of these extractors have no nothing to do.
For anyone who cares, here is the search I'm using. It's running for the last whole day (timeframe name "Yesterday")
source=WMI:* | fields - eventtype | stats count as events, avg(linecount) as lpe, dc(Name) as names, by sourcetype, host | rename sourcetype as st, host as h | eval lpe=round(lpe,1)
I setup the following in my local system folder:
[source::WMI...] KV_MODE = none # Disable splunk's built in message extraction stuff REPORT-MESSAGE = REPORT-kveq = my_wmi-kv-eq
[my_wmi-kv-eq] DELIMS = "\n","=" MV_ADD = true
I can now re-run the same search, and instead of
command.search.kv take 40+ seconds, it now takes just slightly over 2 seconds. (Now, if I could just stop
typer from running I could shave off another 8 seconds.)
Update: I just found out that if you go with the standard auto key/value extraction and no custom transformer sections, the
command.search.kv time drops to under a second. Which moves the bottleneck purely on
rawdata (I/O performance), right where it should be.
Well, here's yet another option. I did a bunch more test and can confirm that the problem is completely contained within the
I simply added the following to
Note: There are two different options that both work rather well.
Option 1: If you like only making minimalistic changes, then this single character change will give you the most drastic speedup:
[wel-message] REGEX = (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)
The only change here is anchoring the beginning of
_pre_msg to the start of the string using "^". It appears that without this the regex engine wanders around aimlessly for a while to try to find the correct match (since the regex runs in "search" mode). Given the greedy nature of both
.+, the whole string should always be matched (that is, if "\nMessage=" exists within the string) but the PCRE must not be optimizing this case so well. (Side note, if I use the default regex using the "rex" command, I don't see this performance issue at all, which seems weird to me. Perhaps there is some difference in how PCRE is used between transform entries vs
rex commands... rather weird.)
Option 2: I think this is an even better regex.
[wel-message] REGEX = (?sm)^[^\n]+\n(?<_pre_msg>.+?)\nMessage=(?<Message>.+)$
This anchors both ends of the regex to the first/last character of the text string (so the regex engine doesn't have any guess work in that regard). It also matches and disregards the timestamp portion of the event. Also
_pre_msg is no longer greedy, so at the first occurrence of "Message=" the first half of the regex can be done and we don't need to keep searching for later occurrences of "Message=" and then backtracking. (This should be slightly faster and the only time you would see an actual different in the output would be if your event had more than one "Message" fields; which seems unlikely.)
Either of these two options have shown to be considerably faster on my system then the default
wmi-message REGEX. I was able to extract a days worth of WMI events with the kv extractor only using 2 seconds of time, which is incredibly better than the original 40 seconds.
I'm hoping this will be address in future releases of splunk, but in the meantime you can speed up your own install with one of the entries show above.
I haven't quite parsed through all of this, but it looks like it is full of awesomeness!