Monitoring Splunk

performance considerations for eventtypes?

Lowell
Super Champion

I've been looking at the "Search Job Inspector" recently and noticing that command.search.typer is often showing up at the top of the list. It's not uncommon for it to be using nearly 50% (sometimes more) of the total command.search time. My searches are not performing unacceptably yet, but I and anticipate the number of eventtypes growing as we add more and more sources (as will the search load) so I can't imaging this will magically improve; so I would like to look at this now, before it become a bigger problem.

Based on general optimization principles, I'm starting with the following assumptions:

  1. The more eventtypes defined the more effort required to match events with eventtypes and therefore a longer execution of typer is to be expected. So reducing the total number of eventypes should improve performance.
  2. Poorly defined eventtypes will be more expensive than a well-defined eventtype. (For example, I'm assuming that an eventtype defined by the search "user!=joe bytes>=1000" would be less efficient than an eventtype defined as "sourcetype=ftp UPLOAD OK")

If I'm missing something or have any of this wrong so far, please say so.

#1: Reduce number of eventtypes:

Based on the Eventtypes' numbers limits question, the answer suggested that the total number of eventypes should ideally be limited to a few hundred. However, I'm not sure that very realistic. (The answer wasn't clear, but I'm thinking that a "few hundred" means somewhere between 200-400?)

I looked at my system and I currently have over 340 eventtypes defined that are shared across all apps. Of those, 111 of the come from the windows app. I have the eventtypes in the "unix" app set to application-only sharing, or that would add another 133 eventtypes globally (I did this because the "unix" eventtypes generally seem to be too-loosely defined and rather unhelpful. To be honest, the quality seems pretty poor. For example, as of Splunk 4.1.3, the Unix app contains 17 eventtypes (e.g. "df", "cpu", ...) that don't even have a "search" defined in the config file. They show up as "None" in the UI. Also the eventtype tags are pretty inconsistent. So I chose to ignore them rather than try to deal with them.)

I have an app with nearly 100 app-level eventtypes. It's fairly self contained, and it would be nice to "block" out the eventypes of the other apps to improve performance within that app, but that's not possible as far as I know.

Again, it seems inevitable that the number of eventtypes will only grow as splunk usage increases. So other than doing some cleanup, it doesn't seem possible to reduce this dramatically.

#2 Optimize eventtype definitions:

This is where I would really like to focus my efforts. The problem is, I haven't come across any recommendations/suggestions/guidelines as to how to write more-efficient eventtypes, and I would really appreciate some input from the people who know this stuff.

Without a good place to start, I've done what I always do: Ask lots of questions!

If these can be answered directly, that would be great, but even starting with some general principles would be a great help. Even a never-do-this list would be helpful.

Here are some specific eventtype performance questions:

What's the impact of...

  • Using the core indexed fields (source/sourcetype/host)? It seems eventtypes based on sourcetype can be included/excluded faster than eventtypes based on simple search terms, is that true?
  • Using index=? (Old docs said you shouldn't do this, but newer docs say any search expression is fine. If I have a bunch of firewall events that only occur in index=firewall will they be faster if I add that to the eventtype definition?)
  • Using splunk_server=?
  • Using field=value in an eventtype? Or is it better to use a literal string (like "EventCode=538") than using the field lookup (EventCode=538)? (Does using an eventtype with fields prevent field extraction engine from automatically disabling extractors when splunk detects that the fields being outputted are not needed by the search. I know some non-interactive searches try to do disable extractors for efficiency when possible, can eventtype get in the way of this?)
  • Using lookup fields? (Example: where an automatic extraction is based on a sourcetype, and that sourcetype is included in the eventtype definition)
  • Using a source/host/sourcetype tags as part of an eventtype criteria.
  • Using indexed fields vs extracted fields? (indexed fields like "punct")
  • Using quoted strings. (Can indexed terms alone be matched faster than a quoted expression? Is there any concept of segmentation here, or does typer re-evaluate the raw events anyways?)
  • Using wildcards (e.g. term*)
  • Nested eventtypes. Say you have an "base" eventtype that is used in the definition of several other eventtype (essentially creating a simple way to extend the "base" eventtype to cover a more specific scenario). So if the base eventtype doesn't match, can typer more quickly eliminate the derived eventypes too? Or does it cause more work? Or is it more like a macro-expansion thing where the eventtype get's unrolled before it's evaluated so it doesn't make much difference in performance in any case?

I'm guessing there are lots of corner cases here. An eventtype definition can go across tons of layers which is what makes them so powerful, and I'm sure that also mean they can be quite expensive at times too. So any hints would be appreciated, and some kind of "profiler tool" would be amazing (I'll even consider naming my first born after you.)

Thanks in advance!

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...