Splunk Search

Extracting fields from Source (Index Time v/s Search Time)

srikarmohan
Observer

Hello,

We are including the Pod Namespace and Pod Name in the Log Source (for K8s deployments) and would like these fields (Pod Namespace and Pod Name) to be extracted.

source: /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/$(Volume Name)/$(POD_NS)/$(POD_NAME)/*.log

Most of our searches (including saved searches) will leverage both, if not atleast one of the two, fields and we were wondering if it is better (performance wise) to do the field extractions at Index Time or at Search Time.

It looks like the general practice is to opt for Search Time extraction, however there are may be cases where Index time extraction is preferred. The examples for using Index time extraction mentioned here (https://docs.splunk.com/Documentation/Splunk/8.2.3/Data/Configureindex-timefieldextraction) are not very clear, it seems like the 1st example might apply to our use case and so Index time might be preferred?

Thanks,

Srikar

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, it depends on the use-case and your data characteristics. Remember than splunk searches data quite differently than - for example - your typical rdbms. It has its own indexes built from raw data split on delimiters so (maybe oversimplifying a bit, but not much) if you search for a "field=value" term, it first looks up all the occurrences of "value" within the events and then checks for which of them the events parse so that that value is in the field called "field".

So if you have, for example, ten different fields of which any can (and will) contain one of - let's say - ten values (repeated between those fields), you might benefit from indexed fields. There are other tricks you might use to speed up manipulation on big data sets like accelerated datasets and accelerated reports.

There are though two pros of indexed fields:

- you can do tstats on them which means you can do some statistical searches very quickly

- you can add some metadata to the event that is not present in the event itself (for example, I do it on my forwarders to be able to quickly see which forwarder the event came from)

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Like PickleRick says, it depends on both your use case AND data characteristics.

Most of our searches (including saved searches) will leverage both, if not atleast one of the two, fields ... The examples for using Index time extraction mentioned here (https://docs.splunk.com/Documentation/Splunk/8.2.3/Data/Configureindex-timefieldextraction) are not very clear, it seems like the 1st example might apply to our use case and so Index time might be preferred?


From the cited example: "if you typically search a large event set with expressions like foo!=bar or NOT foo=bar, and the field foo nearly always takes on the value bar."

Just because most searches involve the two fields does not mean they fit this example.  The example asks three additional questions:

  1. Do most searches contain a negation of one or two of the "always on" fields POD_NS and POD_NAME? (i.e., POD_NS!=somespace and/or NOT POD_NAME=somename, etc.)
  2. Do these searches mostly operate on large sets of events?
  3. Do the negation(s) nearly always result in false? (i.e., nearly always POD_NS==somespace, and nearly always POD_NAME==somename.)

If the answer to any of the three questions is negative, that example doesn't apply.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...