Archive

What is the effect of annotate_punct on indexing time?

Ultra Champion

The architecting Splunk 7.1 Enterprise Deployments class empathizes that setting annotate_punct = false in props.conf at indexer-level can improve significantly the indexing time.

I wonder why setting it like this can improve indexing time and in which cases we should keep the punctuations field.

Tags (1)

Ultra Champion

Our sales engineer said -

PUNCT is exactly like it sounds; it’s an index-time field containing an ordered list of punctuations in an event. This is extremely useful for finding “patterns” of events; like a windows event where the service name and IP address would change but the event structure would remain the same.

It’s used in the background by Splunk sometimes. Very useful for eventtype, tagging, etc.

ANNOTATE_PUNCT in particular is a toggling switch for this setting. It’s on by default, but if you have;
1. Extremely long events
2. Extremely frequent events
3. Events all of the same PUNCT pattern
4. Events of all different PUNCT patterns

Than turning it off will reduce indexer CPU load on the parsing queue in the indexing pipeline.

0 Karma

Ultra Champion

It seems to me that most log files would fall under the 3 category - Events all of the same PUNCT pattern.

0 Karma